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ABSTRACT 


In recent years the analysis of crustal deformation measurements has 
become important as a result of current improvements in geodetic 
methods and an increasing amount of theoretical and observational data 
provided by several earth sciences. 

In this study, a ''first-generation” data analysis algorithm which 
combines a priori information with current geodetic measurements was 
proposed. Relevant methods which can be used in the algorithm have 
been discussed. Prior information is the unifying feature of this 

algorithm. Some of the problems which may arise through the use of a 
priori information in the analysis have been indicated and preventive 
measures were demonstrated. 

The first step in the algorithm is the optimal design of deformation 
networks. As an example of deformation model oriented designs, it was 
shown that regular polygonal deformation networks composed of 
equilateral triangles are uniformly D-optimal for a homogeneous 
deformation field. 

The second step in the algorithm identifies the descriptive model of 
the deformation field. A method based on the entropy measure of 
information, proposed by Box and Hill (1967), was applied to a group of 
postulated deformation models and the identification of the correct model 
was demonstrated through an example. 

The final step in the algorithm is the improved estimation of 
deformation parameters. Although deformation parameters are estimated 
in the process of model discrimination, they can further be improved 
by the use of a priori information about them. According to the 
proposed algorithm this information must first be tested against the 
estimates calculated using the sample data only. Null-hypothesis testing 
procedures were developed for this purpose. 

Six different estimators which employ a priori information were 
examined. Emphasis was put on the case when the prior information is 
wrong and analytical expressions for possible improvements under 
incompatible prior information were derived. 
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Chapter 1 
INTRODUCTION 


1.1 BACKGROUND 

Almost half a century after the theory of continental drift was first 
proposed by Wegener in 1912| the plate tectonic hypothesis was put 
forward by Harris Hiss, J. T. Wilson and others (NASA, 1978). Since 
then it has grown up mainly as a result of an overwhelming body of 
new geological and geophysical evidence that supports the plate 
tectonics idea. 

The plate tectonics model depicts the outer shell of the earth, the 
lithosphere, as broken into small number of large plates moving relative 
to each other and with boundaries marked by the earthquake zones. 

They converge along the seismically active continental margins and 
arcs. They diverge along the axes of the ocean ridges. They slide 
along each other in areas like the San Andreas rift zone in California 
(Drake, 1983). 

Although the plate tectonics idea is based on a simple model, its 
surface manifestations as crustal deformations are complex in nature. In 
spatial spectrum, tectonic plates are assumed to behave as rigid blocks 
to the first order (global scale phenomenon). Their motions, as rigid 
entities, may reach 1-10 centimeters per year. Yet, they undergo severe 
deformations at their boundaries. Regional scale phenomena take place 
over distances less than the dimensions of typical plates, a few 
thousand kilometers, but larger than a few hundred kilometers. Local 
scale phenomena occur in the immediate vicinity of a fault and it is 
closely related to the regional deformations. 

In the temporal spectrum, long term strain accumulations (1-10 
years) are identified in regional and local scales. Medium term episodic 
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changes (less than a year) and short term precursory crustal motions 
before large earthquakes are possible (NRC| 1981). 

In recent years, the improvements in geodetic instrumentation and 
measurement techniques have made feasible the detection of crustal 
movements over reasonably short time intervals. Hence, the 
interpretation of recent crustal movements is now in the realm of 
Geodesy. 

Today the classical ground based geodetic methods provide relative 
position and position changes over short distances with a precision of 
0.1 ppm (Slater et al., 1983). They are useful for determining regional 
and local strain rates on the plate boundaries. Space techniques are 
far superior than the classical techniques for determining plate tectonic 
motion and large scale deformation of the plates. They are precise 
enough (on the order of 0.1 ppm - 0.7 ppm for 200 - 6000 km baselines) 
to detect possible plate motions which may reach a few centimeters per 
year (Coates et al., 1985, Davidson et al., 1985). 

Geodetic observations can provide information about the overall rate 
and direction of relative motions among the plates. It is now possible to 
investigate whether these motions vary along the length of the 
boundary, whether all these motions are accommodated or do all of the 
accommodations to plate motion take place along plate boundaries or are 
some distributed over a broad area. Detailed configuration of the strain 
field around the ends of locked segments of the fault zones can be 
mapped using geodetic observations. Temporal variations can be 
detected. Precursory strain changes in local and regional scales are 
important for understanding earthquake mechanisms. 

Today there is an increasing number of repeated geodetic 
observations and additional information about the nature of crustal 
movements provided by several other earth sciences. The analysis and 
the interpretation of this accumulated data are now more important than 
ever. Meanwhile, a need for improvement of the current analysis 
techniques has already been recognized (NRC, 1981). 
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1.2 PURPOSE 


The purpose of this study is to exploit the use of prior information in 
the analysis and interpretation of recent crustal movement measurements 
by the geodetic methods. 

Two specific objectives are under consideration. The first is to 
develop a general data analysis algorithm in which prior information is 
introduced into the analysis of geodetic data. The second is to propose 
and elaborate on relevant methods which can be employed in this 
algorithm. 

Most of the time geodesists possess some knowledge about the 
crustal deformation phenomenon under consideration. This knowledge 
may come from geological or geophysical investigations, from purely 
theoretical considerations or it may have been derived from other 
geodetical measurements. This type of knowledge is referred to as 
prior, auxiliary or extraneous information. It comes from outside the 
current geodetic observations themselves. It may be qualitative or 
quantitative. 

The desirability of exploiting this extraneous information should be 
clear. The more the geodesists know, the more effectively they can plan 
and analyze. It is intuitive that a gain in efficiency would result at 
different stages of the investigation by the use of additional 
information. The introduction of quantitative information can improve 
the estimation of relevant deformation parameters under controlled 
circumstances. 

Yet, the introduction of prior information needs serious attention 
since there is always a likelihood that it may not be compatible with the 
sample data, i.e., observations. The following questions may arise upon 
introduction of extraneous information: 

1. If a variation of descriptive models is available in modeling 
crustal deformations, how should geodetic surveys be designed 
in order to discriminate among these models? 

2. If prior information exists about the deformations, when should 
they be introduced into the analysis? 
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3. If prior information is chosen to be used in the estimation, 
which estimator should be used? What can be gained by using 
prior information? 

4. If discrepancies exist between prior and the information implied 
by the geodetic measurements, can they still be combined to 
improve the analysis? If so, under what conditions? 

Relevant methods to deal with the aforementioned questions are 
proposed. Their interactions are demonstrated in the proposed general 
data analysis algorithm. 


1.3 SCOPE AND ORGANIZATION 

Chapter 2 starts with a short review of the geodetic aspects of 
local, regional and global deformations. A general data analysis 
algorithm is then proposed. The overall approach of the algorithm is an 
inductive one. "Every solution of a problem raises new unsolved 
problems; the more so the deeper the original problem and the bolder 
its solution" (Popper, 1972). It is recognized that there are no complete 
solutions. Therefore the algorithm tests different models until one model 
is tentatively established. 

The algorithm is composed of four major steps: design of geodetic 

deformation networks, discriminatory analysis of different models, 
diagnostic checking and improved estimation. The use of additional 
knowledge (qualitative or quantitative) is the unifying feature of this 
algorithm. The following chapters elaborate on these topics. 

In Chapter 3, a D-optimal design for homogeneous deformation field 
is first derived. Then an entropy measure of information is used in a 
Bayesian setting which employs prior densities about the relevant 
deformation parameters as additional information for the sequential 
design of geodetic observations and for model discrimination purposes. 
The procedure is demonstrated through an example. 

Chapter 4 discusses the use of quantitative prior information in the 
estimation for the purpose of improving the deformation parameters. 
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First, a statistical testing method, which checks the agreement between 
prior information about model parameters and their estimates from the 
current observations is given. Second, a comparison criterion is set up 
to express the preferences about the estimators. The problems which 
may arise due to the use of prior information, particularly bias in prior 
information, are discussed. Several new theorems are derived for 
possible improvements in such cases. 

Chapter 5 considers the estimation problem when parameters are 
assumed to be stochastic in nature. A similar discussion follows as in 
Chapter 4. 
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Chapter 2 

CRUSTAL DEFORMATIONS AND AN ALGORITHMIC 
APPROACH TO THEIR ANALYSIS 


2.1 LOCAL, REGIONAL AND GLOBAL DEFORMATION ANALYSIS 

In this section the geodetic aspects of crustal movements are briefly- 
reviewed. An ideal approach to the analysis of the kinematics of 
tectonic motions would include both horizontal and vertical components 
for more meaningful inferences. However, the horizontal aspect of 
crustal movements is the main theme in this study. This is mainly due 
to the fact that the plate tectonic hypothesis, which explains tectonic 
motions, is constrained to the surface of the earth. Vertical motions are 
weakly explained through the horizontal motions which are sometimes 
considered to be a failure of the plate tectonic hypothesis in part 
(Beloussov, 1979). However, the proposed approaches in this study are 
general enough to include any type of data if they are proven to be 
relevant. 

First the existing approaches to the local, regional and global 
deformation analysis are briefly discussed. Then a general data analysis 
algorithm is proposed. 

The uniformity of relative plate motions and the rigidity of the 
major plates are the two crucial unifying principles of global tectonics. 
However, these postulates are undoubtedly valid in the long-term 
average sense and for the average totality of the plates. Significant 
departures and elastic deformations in both regional and local scales are 
likely to occur. In fact, such elastic deformations and strain 
accumulations have already been confirmed by the interseismic 
deformation observations along plate boundaries using geodetic methods 
(NRC, 1981). 
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The changes of shape and dimensions of the geodetic networks are 
considered to be deformations of the terrain surface. These 
deformations may be uniform or vary with time. Short- and long-term 
monitoring of these networks at seismically active areas provide 
information about the nature of deformations. 

Two different types of deformation networks are identified: relative 

networks in which all network points are located on the deformable 
object and fixed networks in which there are stable points outside the 
deformable object. 

The interpretation of observed changes in geodetic observables in 
terms of crustal deformations are by no means unique. Today several 
approaches exist for the analysis of geodetic entities. The differences 
among these approaches are a result of the type of the network under 
consideration. 

Let the position of the network points be given at an initial state 
by their projections (xo,yo) on the axes of a Cartesian coordinate system 
XY. Furthermore, let the network points along the same axes acquire 
the displacements u,v which are functions of the coordinates Xo,yo at 
every instant of time. The position (x,y) of an arbitrary network point 
at an epoch t is then determined in the same Cartesian system by the 
following coordinates 

X - Xo + u(xo,yo,t) 

y = Vo + v(xo,yo,t) (1) 

Displacement components u and v can be obtained by the difference 
of station coordinates adjusted at different epochs from the geodetic 
observations. Since physical realization of a coordinate system (fixed 
networks) on a deformable body is often very difficult, coordinate 
systems employed in these solutions should be carefully interpreted. 
Care should be given to use the same coordinate system for different 
adjustments regardless of whether or not they are realized as a result 
of minimum constraint, inner constraint or pseudo-inverse solutions. 
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A simple but not widely used approach in detecting network 
deformations is the conditional adjustment of observed quantities for 
each epoch. Their differences as a result of deformations may then be 
expressed in an appropriate coordinate system to display the relative 
displacement components of individual network points. Alternatively, a 
mathematical model can be set up in which displacement components 
appear as parameters. For instance, if baselines are observed at two 
different epochs, then the changes in these measurements may be 
expressed in terms of the displacement function as 

dX^j = sin o(<j dXj + cos o{<j dyj - sin dx^ - cos c(^j dy< (2) 

where d^jj = '*ioJo ~ baseline measurements 

between two network points at epochs t© and t respectively. <x is the 
azimuth of a baseline and 

= yj “ yjo 

If there are network points, either on the deforming surface or 
outside of it, which are observationally as well as statistically Justified 
to be stable (Koch and Fritsch, 1980), then a reference coordinate 
system may be realized through these points. Otherwise, the 
displacement components are not estimable quantities since they 
transform under changes of coordinate systems (S-Transformations, 
Molenaar, 1981). Therefore, some type of minimum norm or minimum 
constraints solution is required. Deformations, on the other hand, are 
independent of the coordinate system in the sense of changes in size 
and shape. 

Displacement components obtained through these procedures are 
ambiguous in the sense that they do not reveal much information about 
the nature of deformations. More insight is gained if the displacement 
field of the network points is approximated by means of algebraic 
polynomials (Ney, 1978). 
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Let the displacement function for the horizontal network points be 
written, for instance, for a polynomial of a second degree as 

u(x,y) = ao + 3iX + a 2 y + aax* + a 4 xy + ajy* 

v(x,y) = bo + bix + bay + bgx* + b 4 xy + bsy* (4) 

where a and b are the coefficients of the polynomial function. It is 
possible to determine these coefficients from the individual network 
point displacements which are obtained through one of the previously 
outlined methods. 

Alternatively, considering (1) and (3) and substituting (4) in (2), a 
new mathematical model is obtained in which coefficients of the 
approximating polynomial appear as parameters. A least squares solution 
of the resulting observation equations in a fixed network system with a 
sufficient number of observations yields to the determination of 
unknown coefficients. The question of the best fitting polynomial may 
then be answered by a variance test of the significance of estimated 
coefficients (Uotila, 1980). 

Once the approximating polynomial is determined, it can then be 
used for interpolation purposes, for filling the observational gaps 
(Ellmer and Welsh, 1982), or for further inferences about the nature of 
tectonic motions. It may also be useful in determining the local 

interpolating polynomials of the finite element method. 

Observed relative displacement of network points due to the tectonic 
motions are always small (except for sudden ruptures along the fault 
zones which occur during earthquakes). Furthermore, deformations 
which the earth’s crust undergoes as a result of applied stresses can 
be considered to be elastic to a certain extent. Consequently, linear 
elasticity theory, which deals with the description of homogeneous strain 
field, becomes one of the basic tools in analyzing tectonic strain field 
and tectonic motions. 

The interpretation of the observed changes in geodetic observables 
in terms of the crustal deformations is performed in this case by 
describing them through the horizontal strains. Although this approach, 
like the others, is not sufficient for coping with all the problems arising 
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in the analysis of tectonic motions, it is nevertheless better suited for 
further inferences since it provides spatial information in simple 
geometries and achieves a better presentation of results. 

Let the displacement function be represented by the following 
polynomial 


u(x,y) = a© + ajX + a 2 y 

v(x,y) = b© + bjx + b 2 y (5) 


Since infinitesimal strains are, by definition, the linear functions of 
displacement gradients and are given by 


3x ’ 3y ’ 2lay 3xJ ’ 2l3x 3yJ 


( 6 ) 


substitution of (5) into (6) leads to the homogeneous deformation model 
which can be expressed as 

u(x,y) = a© + 6xX + 6xyy - wy 

v(x,y) = b© + 6xyX + 6yy + «x (7) 

In this representation, the parameters have the following geometrical 
meanings: a© and b© correspond to translation elements, ex and Sy are 

the extensional components of infinitesimal strain along the X and Y axes 
respectively and are positive for extension and Oxy is the shearing 
strain, which is positive for the right lateral shear. « represents the 
infinitesimal rotation of the network points (in an average sense). When 
the above quantities refer to a certain period of time with uniform 
rates, the symbols are marked with a dot. The dimension of all strains 
is micro strain, or part per million, or micro strain per year. There is 
also a multitude of other representations of deformations, such as 
engineering shear, principal strains, etc. (see Pope, 1966 and Welsch, 
1982 for a detailed description of homogeneous horizontal infinitesimal 
strains). These elements in the sequel will be called deformation 
parameters. 
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Deformation parameters can be calculated by first determining the 
displacement components and then solving for the strain elements or for 
some other representations using these network point displacement 
components as pseudo-observations. Alternatively, they can be obtained 
directly from a mathematical model which is obtained, for instance, by 
the substitution of (7) into (2). Both approaches yield identical results. 
Their differences are due to manipulative convenience (Brunner et al., 
1980). This is true, however, only as long as the same estimation 
techniques are used in both approaches. As we shall see later, there 
are alternative techniques besides least squares. Results are likely to 
be different if different estimation techniques are used at different 
stages. In such conflicting circumstances it is intuitively clear that the 
direct estimation of the relevant deformation parameters is preferred 
since displacement components play a transient role in the analysis. 

The estimability of the relevant deformation parameters in this 
representation are again determined by the type of geodetic network 
and the type of geodetic observations. If, for instance, baselines are 
observed in a relative network and furthermore their scales are 
compatible at different epochs through calibration procedures, then the 
principal strains, total shear and dilatation are invariant quantities 
under coordinate transformations. They are therefore estimable. For a 
complete discussion of estimable quantities in infinitesimal strain field 
confer: Livieratos (1980), Dermanis (1981) and Welsch (1982). 

In any event, direct representation of repeated crustal movement 
observations in terms of estimable deformation parameters (the ones 
which are invariant under S-Transformations) is the simplest approach 
to avoid ambiguities which may arise due to the difficulties in realizing 
a coordinate system for the relative networks. Nevertheless the 
estimation of coordinate system dependent deformation parameters may 
sometimes be desirable (such as extensional infinitesimal strain normal to 
the fault trace). In this case a suitable coordinate system is introduced 
accordingly at a fundamental epoch. As a result, it is only natural to 
use at any time instant of observation the same coordinate system. 
Otherwise, these parameters are not comparable from one epoch to the 
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other. In this framework, the coordinate system introduced becomes an 
integral part of the descriptive model of deformations under 
consideration. 

The general procedure for calculating strain or displacement field on 
a regional scale can be achieved using the above approaches in 3-D 
(Harvey, 1985). However, treating the network as a whole as 
homogeneous may not be realistic in areas where complex faulting 
systems exist. In this case a nonhomogeneous (homogeneous with 
discontinuities) strain field is modeled that best accounts for the 
observed changes in observations (Chrzanowski et al., 1983). Also, the 
treatment of the strain field as homogeneous over the whole area 
covered by the network implies averaging relevant deformation 
parameters. If such assumptions are considered to be unrealistic, the 
area of interest can be dissected into smaller elements, such as 
triangles. These are then analyzed either individually in local scale, or 
a finite element solution which involves the geometric aspect of 
displacement field is possible (Welsch, 1983). 

In the case of global deformation analysis, it is assumed that the 
lithospheric plates move as rigid bodies on the asthenosphere with 
respect to each other. The rigid body motion of these tectonic plates, 
as a first approximation of global crustal movements, is the crucial 
assumption of the plate tectonics idea. After all, the entire concept of 
plate tectonics would not have much meaning if there are points which 
move relative to each other with velocities comparable to the velocities 
of relative plate motions. Therefore the axiomatic modeling of crustal 
motions on a global scale is much simpler than the local and regional 
scale phenomena. 

Since long baselines can today be measured with sufficient accuracy 
using space geodesy techniques, the plate tectonics theory becomes the 
initial working hypotheses in the analysis of repeatedly measured long 
baselines. 

Let fj and ?j be the position vectors of observing stations on 
different tectonic plates expressed in an adopted coordinate system XYZ. 
Then the chord length for a baseline vector r^j, at an epoch t. 
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connecting two different plates is given by 


r?j = (r, - ?j) • (?j - ?j) (8) 

Linearization of this model about the baseline vector r« t referring to 

O j 0 

an initial epoch t© yields 






£loJo 

^<oJo 


df, 


0 Jo 


dr. 


(9) 


where df< = dfj = Fj - are the infinitesimal displacement 

vectors as a result of global tectonic motions, r^j and are the 

observed baseline lengths at epochs t and t© respectively. Now if the 
observational errors u in baseline measurement differences y are 
considered, then the above model (9) leads to the general Gauss-Markov 
setup of linearized observation equations in which components of station 
displacements appear as parameters x to be estimated. For each 
baseline observed at epochs t and t©, one observation equation is 
formed, 

y = Ax + u (10) 

The solution of this system, in a least squares sense, can be achieved 
by fixing at least one plate and having at least two stations on each 
plate (three repeated baseline observations). An inherent assumption in 
the above model is that the scale of the baseline measurements is the 
same at both epochs. This can be achieved by a priori field calibration 
procedures. Parameters to be solved in this case are the relative 
displacement vector components referenced to the fixed plate. 

Alternatively, minimum norm solutions are possible. Through this 
method the coordinate system is defined in an optimal fitting sense. 
Similarly, if additional information exists about the displacement vector 
for at least one plate a unique solution for the above set up is again 
possible (Bock, 1982). 

Nevertheless, as in the case of local and regional deformation 
analysis, recovered displacement components are ambiguous as they 


13 


stand. They do not reveal much information about the nature of plate 
motions. They simply supply refined quasi-observational data though 
which kinematical or theoretical models can be tested. However, if such 
models are postulated beforehand, they can then be expressed directly 
as a function of baseline observation differences. 

Of such models, instantaneous plate kinematics is the most popular 
in today^s studies. In this approach, displacements assumed to be 
constrained on a spherical earth model are the results of rigid motion of 
plates. Consequently, the motion of the plates can be described by an 
axial rotation following Euler’s theorem (Goldstein, 1965). 

Consider dr^ and d?j to be changes in the position vectors for 
stations located on different plates i and j respectively. Then the 
following relationships hold 


£Ll 

dt 


X 


dr i 

dt ^ ’’j " 


( 11 ) 


where 0^ and Qj are the pseudo-vectors of angular velocities on plates i 
and j respectively. Their Cartesian components are given by 



■ ■ 


CJ COS <> cos X 

0 = 

02 


cj cos 4> sin X 


- ^3 . 


0 sin ♦ 


( 12 ) 


♦ and X are the spherical coordinates of instantaneous rotation axis and 
6 is the magnitude of the angular pseudo-vector. 

Substituting (11) into (9) with the assumption of linear velocities 
results, after some simple manipulations, in the following mathematical 
model of instantaneous plate kinematics 


^^oJo 


jAt 

^’oJo 


[ (^’0 “ ^ Jo ) : (^’o “ ^ Jo ) ] 


Oi 


«j 


(13) 
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where At = t - t©. In this expression components of angular velocity 
pseudo-vector appear as parameters to be estimated as a result of the 
hypothesis which is put forth. 

Estimation of these parameters follows the same arguments as in the 
case of estimation of displacements components. (13) can also be 
expressed as a function of spherical coordinates using the formulas of 
geometric geodesy (Drewes, 1982). 

So far, current approaches to the representation of tectonic motions 
have been briefly discussed within the geodetic framework. Common to 
all these methods is the assumption of uniform time dependency. It is, 
however, recognized that strain accumulations at local and regional 
scales are likely to be nonlinear in time (Thatcher, 1983). There is also 
the possibility of nontectonic motions of network points at all scales. 
Therefore, further refinements in such models become a necessity. 

These problems, along with some new methods in the analysis of 
tectonic motions, are discussed in the following proposed general 
deformation analysis algorithm. 

2.2 AN ALGORITHMIC APPROACH TO CRUSTAL DEFORMATION ANALYSIS 

It is widely recognized today that most of the information about the 
nature of crustal movements is supplied by the repeated geodetic 
surveys (NRC, 1982). Providing a choice of reliable quantitative data 
about the network deformations has been the traditional undertaking of 
geodesy in crustal dynamics studies. Beyond this routine task is that 
of a growing necessity to conjure up more detailed operational 
hypotheses about the kinematics of tectonic motions by geodesists. In 
the meantime, increasing information provided by other disciplines and 
different techniques can now be introduced into the analysis of current 
geodetic observations for more meaningful inferences. 

In the present chapter, an algorithm which combines additional 
information and sample data is proposed. It is a more general approach 
in comparison of the current trends in analyzing geodetic data. 
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The algorithm is composed of the following stages (see Figure 1): 
deformation model oriented network design, discriminatory analysis of 
several concurring models, diagnostic checking and improved estimation. 
Relevant methods that can be used in this algorithm are exploited in the 
following chapters. 

The identification of a suitable model of the deformation process 
which fits well to the changes in the observed geodetic quantities and 
also explains the differences between local regional and global scale 
components of tectonic motions is the main purpose of crustal movement 
analysis. Predicted results derived from such models can then be 
compared with theoretical data supplied by other disciplines and thus be 
used to test the validity and credibility of the fundamental assumptions 
of geotectonics theory. 

Since the underlying physical phenomenon which causes the plate 
motions and the deformations along plate boundaries is still poorly 
understood, the modeling efforts in all scales are to be descriptive. 
These descriptive models, if not capable of constructing the actual 
phenomena by the critique of repeated geodetic surveys, do at least 
play an essential role in demonstrating what hypothesis should not be 
put forth. Questions such as: "Does the rate of tectonic motions 

observed change with time?", "Does the rate of movements vary from 
one place to the other?" and "What can be learned from them aboiat the 
nature of driving forces?" lend themselves to different models. 

Sometimes discrepancies found in different scales can be due to the 
coarseness of the model used. In such cases more detailed models are 
needed to explore the possible modes of deformations, especially in the 
active plate margins where severe deformations are likely to occur. 

Other times there may be more than one descriptive model which 
concurs with the differences of repeated geodetic surveys. Thus, the 
modeling efforts are unescapably iterative in nature on different 
postulates. 

Although postulating a general class of models is mostly based on 
the experience and intuition of an experimenter, the qualitative 
information provided by other disciplines and previous in situ 
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measurements play an important role in the identification of different 
models. Discrepancies found in the interpretation of different types of 
information can help geodesists to postulate such models. 

The selection of the best model can be achieved by performing 
repeated geodetic observations and using statistical tests on estimated 
parameters of different models. Therefore, the model discrimination 
process is intimately related to the design of optimal deformation 
networks and optimal observations. 

Model oriented deformation network designs are necessary in this 
respect. However, such designs are still not differentiated from the 
classical network designs, mostly due to the fact that deformation 
parameters are obtained from the adjusted coordinates in a two step 
procedure as summarized in the previous section. A model oriented 
D-optimal design for the homogeneous deformation field is constructed 
analytically in Section 3.1 as an example. 

Meanwhile, such designs may not be satisfactory if they are 
confined to a single model of the displacement field. Therefore, it is 
also necessary to develop designs which do well for different competing 
models. However, analytic construction of such designs is difficult even 
in simple cases. Iterative methods for constructing model robust linear 
optimal designs are given in a series of articles by Cook and Nachtsheim 
(1982) and Lauter (1974). 

The next step in the algorithm is the identification of the best 
working hypothesis among the others. Since these are temporal models, 
their identification should be a result of sequential experimental designs. 
That is, the observations and the analysis are designed in stages. In 
this way, the data collection procedure can be carried out in an optimal 
fashion since the geodesists set forth the end-purpose of the 
investigation being conducted. 

First, models of the deformation process are postulated. A network 
design which does well for such models is set up. This initial design is 
surveyed at least at two different epochs and the models are evaluated. 
The tectonic motions observed in nature are always small and very often 
they remain well below the observational accuracies of geodetic methods. 
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Furthermore, they are likely to be temporally nonlinear, such as in 
exponential decay in strain rates (Thatcher, 1983). Therefore preference 
of one model over the others is not conclusive if it is based solely on 
observations performed at a few epochs. The data may tend to favor 
one model over the others as the number of surveys increase. This 
information can be used to design new observations. This model 
discrimination procedure is carried out until one model is selected. 
Each time new observations are designed and performed using the 
information gained from the previous resurveys and model evaluations. 
Such a method of model discrimination is formulated in Section 3.3 in a 
Bayesian setting using the entropy concept of information. 

Descriptive modeling is quantified by the estimation of the relevant 
parameters. These two processes are mutually exclusive. Once the 
parameters are estimated, they can in turn be used to design new 
improved models and better estimates. The limiting factor in this 
process, first lies in the accuracy of geodetic measurements. Their 
uncertainties include contributions from both systematic and random 
errors which can be reduced effectively only up to a certain level 
(Mierlo, 1975; Baarda, 1975). Secondly, the coverage of deformation 

measurements for monitoring tectonic motions is still confined at 
particular regions because these techniques are still time consuming and 
expensive. Therefore, such a finite number of samplings may not reveal 
the overall behavior of the deformations since they are representative of 
only a regional and temporal average sense. 

On the other hand, deformation parameters estimated from these 
sample observations can further be improved to some extent by the 
introduction of extraneous information in the estimation (it is obvious 
that there is no need to introduce additional information if the model 
parameters are found to be satisfactory). Perhaps the most important 
example of the use of prior information in geodesy may be proven to be 
in crustal dynamics since the process includes many entities of the 
lithosphere in a complex way, and it involves several disciplines. 
Therefore, there is an abundance of qualitative and quantitative 
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information in addition to the information provided by the geodetic 
methods. 

The qualitative aspect of this information is useful, as discussed, in 
postulating different models. Quantitative information may arise in 

various ways in crustal movement analysis. Sometimes theoretical 
solutions such as global motion of plates based on the driving force 
models for plate tectonics (Solomon and Sleep, 1974; Kaula, 1975) or the 
global intraplate stress fields predicted by various force systems, other 
times the results of past samples such as in situ measurements of strain 
through extensometers, resurvey of old geodetic networks or examination 
of stress induced geological structures, are thought to give information 
about parameters. In all these cases, prior information does not arise 
from the particular body of data currently being analyzed. If this 
information is correct, then it is surely useful to incorporate it into the 
statistical estimation procedures as well as physical modeling of such 
phenomena. Such prior information may be valuable in increasing the 
precision and the reliability of the estimates, particularly when samples 
are limited in size. 

Therefore, the study of valid prior information becomes an integral 
part of the proposed algorithm which will be referred to as diagnostic 
checking in this study. In this step of the algorithm, quantitative prior 
information should be identified and tested against the sample 
information. A null-hypothesis testing procedure is given in Section 4.2. 
If prior and sample information are found to be compatible then they 
can be combined in the improved estimation of parameters. 

However, noncompatibility as a result of such tests by no means 
implies that prior information is erroneous and should be totally 
disregarded. Testing procedures themselves are likely to contain 
further uncertainties, after all. For example, probability distributions 
attributed to the sample observations may not be realistic or 
significance levels, if used, may be too small compared to the signal to 
noise ratio. There may also exist the possibility that the information 
contained in current data is different than the additional information 
simply because the latter is representative of tectonic motions only in 
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the long-term average sense. Similarly, prior information as a result of 
global solutions may be different than the small scale information 
because of the local properties of tectonic motions. 

Consequently, prior information in practice is not known with 
certainty. What then is the purpose prior information if we can never 
be sure of the validity of such constraints? The answer lies in the 
fact that additional data may always contain a certain component of 
reality. If the value of it is found to be small then it is clearly unwise 
to use dubious information. However, if the value of improvement is 
large, then it may pay to use even dubious information. 

A better alternative is to use estimation techniques which are 
insensitive to wrong prior information about parameters but which still 
result in better estimates than those using sample observations only. 

At this point, it is necessary to identify two different concepts in 
estimation. First, the well known approach, namely, estimation when the 
model parameters are deterministic; and second, estimation when the 
model parameters are considered to be stochastic (random). In the 
latter case, statistical model setups are known to be random effect 
models (Searle, 1974; Harville, 1976). Estimation of random parameters is 
also called prediction (Searle, 1974; Schaffrin, 1983). 

So far implicit in the discussion is that the mechanism of the 
underlying phenomena is deterministic but unknown to the user. The 
representation of displacement field with stochastic parameters does not 
contradict with this view. 

There is no conflict between causality or randomness or 
between determinism and probability if we agree, as we must, 
that scientific discoveries are not discoveries of nature but 
rather inventions of human mind. Their consequences are 
presented in deterministic form if we examine the results of 
a single trial; they are presented as probabilistic statements if 
we are interested in averages of many trials,,, (Papoulis, 1984). 

Similarly, in crustal movement analysis, averaging strains over whole 
areas as well as over certain periods of time can be interpreted to mean 
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strain parameters are random with certain mean and variances. This 
becomes much more clear in the case of global plate kinematics. In this 
process the deformations in temporal and spatial spectrums can be 
interpreted as smooth motions as trends over geological time scales with 
superimposed short term jerky motions (Kaula, 1978). Solutions such as 
numerical analysis of instantaneous plate kinematics (Minster and Jordan, 
1978) can be recognized as the expected values of possible tectonic 
motions whereas current global observations are contaminated, in 
addition, with short term random effects. 

Therefore, information as a result of temporal and spatial averaging 
can be introduced into the analysis of current tectonic motions within 
this interpretation. However, in all likelihood such information may be 
wrong. Hence, it is desirable to have estimates which are not dominated 
by erroneous prior information. 

Of course estimators with such virtues are hard to find. Moreover, 
it would be a mistake if one estimator or predictor is singled out among 
the others and used under all varying conditions. Therefore the study 
of suitable estimators, their behavior under different types of prior 
information and the gain from using prior information becomes the last 
part of the proposed algorithm. These problems are examined in 
Chapters 4 and 5. 
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Figure 1. Algorithm for Crustal Movement Analysis 








Chapter 3 

DESIGN AND MODEL DISCRIMINATION 


3.1 D-OPTIMAL DESIGNS FOR HOMOGENEOUS STRAIN FIELD 
A geodetic network is defined through geodetic measurements between 
observation points and the definition of a datum (coordinate system). 
Geodetic networks are designed according to a particular task under 
consideration, type of measurements and their accuracies using a 
predefined optimization criteria. Optimization of geodetic networks are 
well established in the geodetic literature (see (Grafarend et al., 1977) 
for a comprehensive review). Design of deformation networks can also 
be examined within the existing approaches. Bock (1982), using 
approximate theory for optimal designs, has established D and A optimal 
polyhedra designs for the distribution of crustal movement monitoring 
stations over the earth. However, as discussed in Chapter 2, the 
two-step procedures (first point displacements in terms of coordinate 
differences at different epochs are determined then the deformation 
model parameters are solved using the results of the first step as quasi 
observations) can equivalently be replaced by the direct estimation of 
relevant parameters. In this case it is natural that optimal design of 
deformation networks should be oriented to these particular models of 
deformation. But this problem has not been fully exploited in the 
geodetic area. 

In this section, a model oriented D-optimal network design is 
constructed as an example of the homogeneous deformation field model 
using probability measures. 

Consider the linear model 

y = Ax + u , u ~ (0 ,<t2I) (1) 
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where y is an n*l vector of observations, A is an n*m design matrix of 
full column rank and is composed of Ixm vectors of rows (a< ) depending 
on the form of the model assumed. x is an m*l vector of unknown 
parameters, u is the vector of independently and identically distributed 
random variables with mean zero and variance orj. The experimental 
region is denoted by x and a^ are continuous on x> Then from the 
ordinary least squares, the dispersion matrix of x is 

D(x) = N“‘ (2) 

where N:= A). In subsequent discussion N will be called the 

information matrix of the experiment e and will be denoted by N(e). 

The design problem, therefore, consists of selecting vectors of 
control variables aj e x » i = 1» •••» n such that the design defined by 
these n vectors is in some defined sense optimal. 

Definition 1 ; We define the optimal design for deformation networks as 
the one that minimizes the determinant of the covariance matrix of 
parameters. 

Let n denote the number of repetitions of the i*^^ observed quantity 

n 

such that = I and Mis the total number of observations; then, 

1 = 1 

Definition 2 ; (Fedorov, 1972). A normalized design e is the collection of 

n 

variables Pi, Pa,..., Pn» sii » ^a, •••» an» where I p< = 1 and p< = n^/N. 

1 = 1 

Theorem 1 : (Fedorov, 1972). The family of matrices N(e) corresponding 

to all possible normalized designs, is convex. 

Corollary 1: If |N(e)| is the determinant of the design of an 

experiment, then maximizing |N(e)| is equivalent to minimizing |N(e)|“*. 

Definition 3 : (Silvey, 1980). The design that maximizes |N(e)| is called 

D-optimal. If the errors are normally distributed then a confidence 
ellipsoid for the parameters x (of given confidence coefficient and for a 
given sum of squares arising from the design e), has the form 
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{x: (x - x)'N((x - x) < constant} 


(3) 


where k is the least squares estimate of x. The content of this ellipsoid 
is proportional to Hence, D-optimal design makes this ellipsoid as 

small as possible. In summary, e is D-optimal iff N(e) > 0 (positive 

definite) and 

max |N(e)| = |N(e*)| (4) 

a c X 

where e* is the optimal design. Since u ^ (0 ,<t 2I)» D-Optimal design is 
also a minimax design (Kiefer, 1961). 

Definition 4 : (Fedorov, 1972). e is G-optimal iff 

min max d(a,e) = max d(a,e*) (5) 

e a e X 

where d(a,e) is the variance of the expected response E(y). In words, 
the optimal design e minimizes maximum of d(a,e). A sufficient condition 
for e to satisfy G-optimality is 

max d(a,e*) = m (6) 

a e X 

Theorem 2 ; The general equivalence theorem (Kiefer, 1961): Conditions 

4, 5 and 6 are equivalent. 

We are now ready to set up a semi- intuitive approach to determine 
an optimum design for the configuration of deformation networks with 
baseline observations satisfying D-optimality criteria. By consideration 
of continuous analogs of discrete designs we will arrive at certain 
conditions under which discrete designs can be constructed. Such an 
approach reflects the approximation of continuous designs with a 
distinct number of points in their spectrum. 

Definition 5 ; A design measure is a probability measure denoted by f 
on X* Specifically, f c (D) of all measures defined on the Borel field B, 
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generated by the open sets of x such that 


J d^(a) = 1 0 < f < 1 (7) 

X 

Given a measure in D the information for the experiment e will be taken 
as 


N(e) = J a'a df(a) (8) 

X 

When f(a) is an absolutely continuous distribution, it has a probability 
density function given by p(a). Equations (7) and (8) then reduce to 

N(e) = J a'a p(a) da 
X 

J p(a) da = 1 p(a) > 0 (9) 

X 

In the case of a design with finite spectrum having n points 

N(e) = I p^a;a^ (10a) 

1 = 1 

Zp<=l 0<P(<1 (10b) 

1 = 1 

Then for any discrete design e it is possible to form a measure f by 
attaching a mass n^/N to each point of e. From these definitions we see 
that if a continuous design is found then the discrete designs can be 
constructed by solving the equations in (10) for a and p. 

Consider now the mathematical model for repeated distance 
observations at two different epochs (Pope, 1966) 

di = i sin^of e>< + •* cos^oi i sin 2a e^y (11) 
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This model appears as a trigonometric function in azimuths where e^, Sy 
and 6xy are the deformation components of the infinitesimal strain tensor 
to be estimated and 

a^ = i[sin*o(^ cos*o(< sin2o(<] (12) 

o(< is the azimuth of an arbitrary baseline observation i and di = i - io 
is the difference of two baseline observations performed at two different 
epochs. Then the information matrix for these pseudo-observations is 


N(e) = J a' (of) a(of) df(a) 
X 

If a uniform measure 
df(a) = (27r)-‘ dof 
such that 



(13) 


(14) 


(15) 


is selected in the domain of all possible azimuths [0,2n], then from (13) 


N(e) 


(2rr) a' (of) a(of) dof 
0 


Now considering the following relationships 


(16) 


j.27T 

sin kx cos dx = 0 i * k 

“^0 

p27r ^ 2 tt 

sin^kx dx = cos^kx dx = 0 
”^0 0 


(17) 


it can easily be shown using (12) and (16) that 
d(of<,e) = 3i (of) N(e)~‘a5(of) = m = 3 


(18) 
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Therefore, baseline observations with the above uniform measure are 
G-optimal. From (4) and the equivalence theorem it follows that uniform 
design is D-optimal and minimax. 

Now a discrete design can be found. The azimuths o(j and point 
masses p< must be such that the following equations hold (from equation 
10 and 12) 

Pi^ii “ ZPi^aa ~ g 5.'^?, Pi^ia ~ g 

5. Pi®i 3 ~ 5. Pi^aa ~ 0 J — 1, i — 1, 2, . . ., n (19) 

It is not difficult to verify that for ti = ^a = •* 3 , n = 3 and p^ = 1/3, 
the above equations are satisfied. It follows that an equilateral triangle 
is D-optimal. Similarly, regular polygonal designs composed of 
equilateral triangles are also uniformly D-optimal for the homogeneous 
strain field. It can be shown that certain arbitrary designs of 

equilateral triangles are D-optimal. But uniformity of the design is no 
longer valid for these configurations, i.e., p^ * 1/n. 

Since the above approach is oriented to the optimal configuration of 
deformation networks in the homogeneous deformation field, it can be 
classified under the first-order design problem of geodesy. In the case 
of P( ^ 1/n, resulting configurations composed of equilateral triangles 
can be classified within both first- and second-order design problems 
by interpreting p< as observational weights. Meanwhile this approach 
is not sufficient if more than one model which can represent the 
deformation field, as recognized and discussed in the proposed 
algorithmic approach, are postulated. In this case, it is necessary to 
design deformation network oriented network designs which does well 
for different competing models. Such designs can be constructed 
iteratively using also prior preferences as weights on each model. This 
problem can also be examined, as proposed by Schaffrin (1986, private 
communication), within the reliability aspects of geodetic networks 
(Baarda, 1977). This problem will not be discussed in this study. 
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3.2 SEQUENTIAL MODEL DISCRIMINATION 

In Chapter 2 the importance of model discrimination in crustal movement 
analysis was discussed in conjunction with the proposed algorithm. 
Currently five major approaches to the model discrimination problem can 
be identified (Chrzanowski, 1981): Delft, Hannover, Karlsruhe, Munich 

and Fredericton. 

The Delft approach (Mierlo, 1978; Kok 1977, 1980) consists of first 
testing each single network measurement by so-called data snooping; 
second, testing the stability of the reference points using data snooping 
again but this time treating coordinates from individual network 
adjustments as observations; third, testing of single point displacements 
at each point of the network; and fourth, testing of deformation models 
which are postulated a priori as systematic relative displacements of a 
group of points in the network during an epoch. 

The Hannover approach (Pelzer, 1974; Niemeier, 1979) is based on 
global congruency tests using analysis of variance techniques. First an 
outlier check of observations for each single epoch is performed, and 
the observational weights within and between epochs are determined. 
Second, the global congruency of the network geometry between the 
epochs is tested to detect significantly moved network points at 
different epochs with respect to the mean geometry of the network. 
Then the corresponding displacements are computed. 

The Karlsruhe approach (Heck, 1980) is based on the analysis of 
variance and use of confidence regions for point displacement vectors. 
The approach consists of the following steps. First, measurements for 
each epoch are adjusted for outlier control and for the final 
determination of observational weights. In the second stage, the 
internal stability of a group of reference points is tested. Then a 
simultaneous adjustment of all epochs is performed using the stable 
network points as reference points. From the resulting coordinate 
differences of the other network points, displacement vectors with 
respect to these fixed points and their corresponding confidence regions 
are computed and displayed for the visual interpretation of the 
resulting deformations. 
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The Munich approach (Welsch, 1982) puts emphasis on the 
description of deformations in consideration of coordinate system and 
geodetic datum invariants. First the existence of significant 

deformations is checked using a datum invariant statistical test. 
Repeated observations are then modelled as a function of certain 
parameters of the deformation field which are again invariant with 
respect to the underlying geodetic datum and coordinate system. These 
parameters are then estimated for each epoch together with their 
significance based on a certain level of probability. 

The Fredericton approach (Chrzanowski, 1981; Chrzanowski et al., 
1983) cdmbines some of the aspects of the previous approaches in a 
more general algorithm. In addition it analyses and interprets the 
resulting changes in the repeated observations within the infinitesimal 
homogeneous as well as nonhomogeneous deformation field. First, 
observational data is screened for outliers and possible trends of 
deformations are investigated. Second, single point displacements or 

relative displacements of the network points are determined employing 
the changes of the observed quantities at different epochs as 
observations using a minimum constraint solution. The resulting 
displacement components are then used as quasi observations which are 
then modelled as a result of postulated deformation models. Finally 
these deformation parameters are estimated for each model and the best 
fitting model is selected using the results of global statistical tests and 
the calculated significance levels of deformation parameters for each 
model. 

In summary, the existing approaches to the analysis of crustal 
deformation measurements are oriented first to the detection of crustal 
movements and in a limited sense to the descriptive model discrimination 
problem. However, in the analysis of statistical decision theory, two 
problems are generally distinguished. One is the problem of making the 
best decision (identification of an appropriate model in our case) on the 
basis of a given set of data, and the other is the problem of designing 
the best experiments in order to get information upon which a decision 
will be made. In this respect model discrimination procedures in an 


30 


algorithmic approach should include these two problems. But the 
existing approaches to the analysis of crustal movements address only 
the problem of model discrimination. A combined analysis based on the 
design of measurements to facilitate the discrimination procedures within 
the scope of an experimental design concept has yet to be exploited. 

In this section, a method based on Bayesian philosophy and entropy 
measure of information is proposed for the elucidation of time-dependent 
models of crustal motions. The method, which is due to Box and Hill 
(1967), combines model discrimination and design of measurement 
concepts in a sequential strategy. First the method is reproduced, then 
the strategy is adapted to the time dependent models of crustal 
movements through an example. 

Assume that geodetic surveys are performed at different epochs. 
The data indicate that crustal motions occurred during these time 
intervals and a set of concurring descriptive models are postulated 
either as a result of the displayed dislocation patterns of the network 
points and/or previous information. The question is how to use the 
data to select the best model and to design new optimal observations to 
facilitate model discrimination. 

Since in general every decision is a result of some type of decision 
rule, it is necessary at this point to define a discrimination criteria. 
Let the state of the deformations be represented by the parameter 
vector X and suppose that H is the set of concurring models 
(hypothesis). The preference of one model over the others (best model) 
can be ranked by means of a loss function L(ff,x). If 

X(Hi,Xi) < KHa.Xa) (20) 

holds, then H, is preferred to where H,, H 2 f. H. But since x is 
unknown and must be estimated from a set of observations y which are 
random variables due to the observational errors, it is intuitively much 
more appealing to make a decision based on the smallest possible 
expected loss over all conceivable set of samples S rather than loss 
only. Therefore, 
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( 21 ) 


r(X,x):= Ji:(H,x)f(ylx) dy 
S 

where r(L,x) is the expected loss (or risk) and f(ylx) is the density 
function of y given x. Thus, the problem of selecting the best model 
can be judged by examining the above risk function when x is unknown. 

However, if some a priori information is already available about x 
before the observations are performed, this information can be used to 
make the best selection after the data is available. If, in particular, 
this information is a probability function on x then, 

R(X,x):= J J X(H,x)f (ylx)f(x) dydx (22) 

P S 

or considering (21) 


R(i,x) = J r(X,x)f(x) dx (23) 

P 

where f(x) is a multivariate density on the parameter space P 

representing the prior information and f(ylx) is now interpreted as the 
conditional density of y given x. Again, it is possible to make a 

selection, this time in the presence of prior information, which minimizes 

the risk given in (23). This approach in general is known as the 

Bayesian solution. Equivalently, considering the Bayes formula, the 
conditional distribution of x given y is defined as 


f(xly) 


f(ylx)f(x) 

f(y) 


if f(y) > 0 


where 


f(y) = J f(ylx)f(x) dx 
P 


(24) 


is the marginal density on y. Then (23) takes the following form. 


R(X,x) = [Jx(H,x)f(xly) dx] f(y) dy 


(25) 
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Now if a loss function that measures the relative importance of various 
selection errors is specified, the posterior density function is derived 
from the prior density and the likelihood function f(ylx) of the process 
is known then the optimal decision can be made for the hypothesis 
which minimizes the expected loss. 

A number of solutions can be obtained to this problem for different 
loss functions assigned. For instance, a commonly used function for the 
selection error (x - 51*) is its quadratic function, where x is the true 
parameter vector and 51* is the estimated parameter vector for the i*’^^ 
model. 

An alternative approach is to use entropy measure of information 
(Box and Hill, 1967) which has been developed in communication theory 
(Shannon, 1948). 

Consider a complete set of events, E^, i = 1, 2,..., m whose probabil- 

m 

ities are pj, Pm such that Z Pi = 1. The expected information 

i = i 

(I) of the message on the occurrence of one of these events is defined 
as 

I:= Z Pi In ^ (26) 

i=i Pi 

which is also known as the entropy of the distribution whose 
probabilities are p^, i = 1, 2, ..., m (Shannon, 1948). The least possible 
information occurs when p, = Pa...Pm ° 1/m, which can be derived by 
maximizing the above equation subject to the constraint Z Pi =1. In 
this case, the amount of information is small and the entropy as a 
measure of disorder is maximum. In other words, all events are equally 
likely. In situations where the probability of one event pj is larger 
than the probability of other events pj, j * i the amount of information 
is considered to be large and entropy is small. This concept can now 
be applied to the discrimination of different competing models. Let 

there be a set of m competing models and the a priori probability of the 
i^h model being true is pj. If the observations are performed and the a 
posteriori probability for the i*** model is computed then the information 
gained by this experiment is specified, from (26), as 
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AI(H,x) = - f p,„_i In p,n_, + X Pin In p,„ (27) 

1=1 1=1 

This expression is similar to the previously discussed loss function 
L(H,x). However in this case the maximum of AI(H,x) is of interest in 
order to obtain the greatest amount of information out of the 
experiment. Now, substituting (27) into (22) and considering that there 
exists a finite number of models, the expected change in entropy 
(information) E(AI) =:AJ before and after the n^^ observation is obtained 

AJ = I Pin-1 I I Pkn In Pkn - I Pk„-i In Pkn-i| ’ Pi(y'x) dy„ ( 28 ) 

1 = 1 J '^k=i k=i 


where 


_ Pln-1 Pl(ylx) 
Pin m 

J Pin-iP^(y'x) 
e=i 


Substituting (29) into (28) 


AJ = i Pin-1 Pi (y*x) In 


Pi (y'x) 

m 

J Pin-iPi(y'x) 
i=l 


Now an observation which maximizes (30) is considered the optimum one. 
Evaluation of (30) however, is quite complicated but an upper bound for 
this expression leads to a tractable form. Consider 


?! _ _ / . s , Pi(y'x) . . . . , Pi(y'x) 

I Pjn-iPi(y'x) In , , ) > Pi(ylx) In — 


.1 PinPi(y'x) 
i = l 


which is a result of Corollary 3.1 of Kullback (1959). Substitution of 
(31) into (30) gives an upper bound AJ^ for AJ 


r 

55 55 Pi(ylx) PT(ylx) 

^IPin-iPjn-i Jpi(ylx)ln dy„+ pj(ylx)ln ^777^0 
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Let now a group of competing models be given by 


E(y(^)) = , i = 1, 2, . . . , ra (33) 

where y(^) is the mxl vector of observations for the i^^ model, is 

the nonstochastic nxu design matrix, and x(^) is the uxl unknown 
parameter vector for the i*^ model, u is not necessarily the same for 
all models. If the observations y„ are assumed to be distributed 
normally with mean E(yn) and known variance <r^, the following 
relationships hold (Box and Hill, 1967) 

Pi (yn‘E(yn),ff] = exp{- ^ [y„ - ECy^)] } (34) 

Pi(E(yn)l‘^j = 7==“ e^^pl" 2^ [E(yn) - y^^^] } (35) 

y 2 tt <r^ '■ 

where y,(*) is the predicted value of y„ under model i using n - 1 

observations and its variance is <rf which is given by 

<rf = <r‘a(0[A(0'A(0}~^a(0' (36) 

where a,(*) is the row vector of a(*). Prom the definition of the 

probability density function of y„ under model i given <r and n - 1 

observations 

Pi(yn'<^) = J Pi (yn'E(yn),<r) Pi(E(yn)lcr) d(E(y„)) (37) 

If (34) and (35) are substituted in (37) and integrated then 

= /■2rT(a^T;f) ®^p(" 2(<r=‘\ <r?) (38) 

Substitution of (38) into (32) results in the following operational form of 
discrimination function 
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f 1 + I 1 

l(<r* + <r?) (<r^ + 


where y,(0 and y,(j) are the predicted observations for model i and j, 
<r^ and <rj are the predicted variances obtained from (36) for these 
predicted observations and <r* is the known a priori variance of the 
observations. 

It is now possible to design sequential geodetic surveys to 
discriminate the descriptive models of deformations using this entropy 
measure of information. The scenario which is depicted in Figure 2 is 
as follows. 

First an initial network design for the area under consideration is 
constructed, for instance, using the D-optimal design criteria which is 
discussed in Section 3.1. Geodetic surveys are then performed at two 
different epochs covering the whole network. This is followed by the 
estimation of deformation parameters from the differences of observed 
quantities. At this point, information provided by the current estimates 
and prior qualitative information about the models are examined and 
prior probabilities are assigned to each model. If no preference is 
inferred from the existing information each model is assigned equal 
probabilities p = 1/m, where m is the number of models. The next 
optimal observation (not necessarily the resurvey of the whole network) 
that gives the maximum expected discrimination among m rival models is 
sought in (39). Then the new optimal measurement(s) is performed and 
posteriori probabilities for each model are computed using equation (29). 
Finally, the current standing of each model is examined. This procedure 
is repeated each time, using posterior probabilities of previous 
observations as prior probabilities for the succeeding observations, until 
one model emerges from the others. The following numerical example 
illustrates the procedure. 
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Figure 2. Sequential Model Discrimination 


37 










3.3 NUMERICAL EXAMPLE 

In Chapter 2, the necessity for sequential experimental designs for the 
crustal movement analysis was discussed and became part of the 
proposed algorithm. In the previous section, a method based on the 
entropy measure of information is presented as a possible candidate for 
the discrimination of several competing time-dependent models. In order 
to get a better feeling for the applicability of the method to the crustal 
deformation analysis, this section describes a numerical example. 

Consider the case of homogeneous deformation field. The following 
descriptive models are postulated as a result of prior experiments to 
represent possible network deformations. 


model 

1. 

dx = + e^yy) At 

dy = (e^yX + 6yy) At 

(40) 

model 

2. 

dx = e^x At 

(41) 

model 

3. 

dx = At 

dy = e^yX At 

(42) 


where dx and dy are the displacement components of network points for 
the period it, e^ and Sy are the extensional strains in X Y directions, 
e^y is the shearing strain. If the baselines are observed at different 
epochs then the baseline length i ^ j at epoch t is given by the following 
expression, 

= (Xj - X,)" + (Yj - yi)=' (43) 

Linearizing this expression about the initial epoch to and considering 
equations (40), (41) and (42) results in the following mathematical models 


model 

1: 

^ijt “ 

jt 

model 

2: 


jt 

model 

3: 


i j t 


= Ati i j (sin^ofj je^+cos* 
= At j sin^ofj j 
= At i j j sin 2o(^ j e^y 


i jey+sin2o(^ je^y) (44) 

(45) 

(46) 
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where ^ijt and ai"® the observed baseline lengths at epochs t and 

to respectively, and ofjj is the azimuth of the observed baseline i - j. 
It is assumed that the scale of baseline observations are the same for 
different epochs and no stable points or external reference observations 
are available on the deformable object. Although the differences of the 
observed baselines can be expressed in terms of the coordinate system 
invariants of the homogeneous deformation field, the above 
representation is chosen to investigate and model the contributions of 
the infinitesimal strain elements in an "adopted” orientation of the 
coordinate system. Since the above mathematical models (44)-(46) are 
functions of the baseline azimuths ot^j, the orientation of the coordinate 
system should be kept the same from one epoch to another. Otherwise 
model parameters are not directly comparable at different epochs. 

In this framework, model 1 implies that observed baseline 
differences are explained by all three components of the infinitesimal 
strain elements within the adopted orientation of the coordinate system. 
Model 2 suggests that these differences are only due to extensional 
strain in x direction, whereas model 3 investigates the differences as a 
function of the shearing strain. Note that models 2 and 3 are special 
cases of model 1. For particular orientation of baselines within the 
adopted coordinate system, models 2 and 3 and models 1 and 2 cannot 
be differentiated from each other for djj = 0 * and 0(5 j = 90* 
respectively. However, as we shall see, the discrimination function 
successfully selects the alternative orientations (namely, the baselines in 
the N-E and N-W directions) for which all three models are identified. 

In this example, model 3 is chosen to be the correct model and the 
pseudo-observations derived using this model. They 

are also contaminated with a noise from a normal distribution with mean 
zero and variance 1 mm^. Shearing strain, e^y, in the correct model 
(46) is 0.50 ppm and the time interval between observations is constant 
and equal to a month. The sequential model discrimination procedure 
can now be performed using the algorithm depicted in Figure 2. 

An initial design is set up for the measurement of deformation 
parameters (Figure 3a). This is a D-optimal design of model 1, which 
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was derived in Section 3.1, with some additional observations. It is 
initially postulated that all three models are equally likely. In other 
words, prior probabilities for each model are 1/3. Part of the network 
is resurveyed a month later to compute initial estimates for each model’s 
parameters. This makes it possible to predict a new observation for the 
next month that maximizes the discrimination function AJ„ (39) for each 
model and selects the possible observation which gives the maximum 
discrimination. In this example three alternatives are possible: 
baselines which are in the north-east direction, baselines which are in 
the south-west direction, and baselines which are in the east-west 
direction. The predicted observation is then performed and posterior 
probabilities for each model are computed using the new observations 
and prior probabilities (29). 

The rest of the experiment continues following this prediction and 
observation procedure until computed posterior probabilities indicate 
that one model is superior to the others (Figures 3c, 3d and 3e). 
Figure 3 shows that the correct model 3 is identified effectively with 14 
baseline observations after 6 months. 

As a comparison, null-hypothesis tests (Ho: Model 3 is the same as 

model 1, Hq: Model 3 is the same as model 2) are performed using the 

estimated parameters obtained by the least squares method at 5% and 1% 
significance levels. Neither hypothesis is rejected at 1% significance 
level until At = 5 months (Figure 3e). In the case of « = 0.05, model 2 
is rejected at At = 5. Model 2 is rejected at At = 6 months at both 
levels. However, the results were ambiguous in the sense that model 2 
and model 3 were still likely candidates until the last measurement was 
performed. This problem is clearly eliminated by the proposed method 
due to the history of accumulated measurements and calculated posterior 
probabilities of each model. 

Meanwhile, the assumptions underlying (39), mainly the normal 
distribution of observations, will not always be satisfied in practice. 
Nevertheless, (32) can still be used to derive different criteria for the 
expected changes in information provided that suitable approximations 
(distributions) are available for evaluating (32). 
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Figure 3. Results of Sequential Experiment for the 
Discrimination of Three Competing Models 
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In addition, H, the set of hypotheses, is assumed to be complete; in 
other words, the true model is in the set of postulated models in the 
given example. This is certainly not the case in practice. On the other 
hand, H may contain a suitable approximation of the true model. In this 
case the discrimination procedure will require more iterations to identify 
this approximation. A stopping criterion needs to be established under 
these circumstances. Also the use of one of the previously outlined 
model discrimination approaches together with this method will provide 
some insight to identify a suitable model when more than one model 
competes after several iterations. If a suitable model cannot be 
identified, then the discrimination procedure continues, as described in 
the proposed algorithmic approach, by postulating a new set of models 
and performing new observations until one model is selected. The 

methodology presented in this section predicts a single observation for 
the following epoch. More observations can be predicted using (39). 
However, in this case, (29) needs to be modified to evaluate the 
posterior probability of each model. 

In any event, the usefulness of the proposed method should be 
investigated further by applying it to more complicated real world 
deformation problems. 
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Chapter 4 

ESTIMATION USING PRIOR INFORMATION 
4.1 INTRODUCTION 

So far, initial design of optimal deformation networks and the 
experimental design problem for the discrimination of possible crustal 
movement models has been discussed within the context of the proposed 
approach. As a result of these steps an operational model which 
describes the differences of repeatedly measured quantities is identified. 
Relevant deformation parameters of this model are then estimated using 
these measurements (sample data). If these estimates are found to be 
satisfactory, then there is no need to go further. As long as predicted 
results derived from this model are in agreement with measurements 
performed at later epochs, it can be used for further investigations. 
Otherwise the model loses its credibility and the whole procedure starts 
again to obtain a better model. 

Meanwhile uncertainties attributed to these parameters may be large, 
for instance, due to the precision of repeated observations. In such 
situations current model estimates can be improved to a certain extent 
by introducing prior information provided by theoretical models and/or 
previous investigations about these parameters. 

Since this information can be introduced into the estimation process 
using different types of adjustment techniques, the last part of the 
proposed algorithm examines some possible alternatives. 

Clearly no method of estimation can be perfect, but one method may 
perform better than the alternatives under a fixed situation. Estimation 
with prior information is no exception. Our main concern in the 
proposed algorithm is with the problem of improving the estimation of 
crustal movement parameters using prior information rather than finding 
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’’the best estimator" which will govern the whole analysis under all 
possible circumstances. It is proposed that different estimators should 
always be considered for the particular problem under investigation and 
decision (selection) should be made by assessing the gain for each of 
the estimators which employs prior information. The results of one 
estimator which is more efficient than the alternatives are then 
accepted. Such an approach is obviously inefficient for practical 
geodetic problems, yet it pays off in crustal movement analysis, where 
improvements in parameters are very crucial. 

In this chapter, first a qualitative criterion to compare different 
estimators is established. Then, three estimators, the well known mixed 
model estimation (Best Linear Uniformly Unbiased Estimation BLUUE), 
Best Linear Estimation (BLE) and Best Linear Unbiased Estimation 
(BLUE), which all use prior information, are examined under the 
established criterion. 

Since the purpose of using prior information is to improve the 
estimation of relevant parameters, the above estimators are compared to 
the Generalized Least Square Estimator (GLSE) which is independent of 
prior information using the criterion of betterness. 

In these comparisons, it is first assumed that observations are 
affected only by a random noise, that the linear model explains the 
phenomena and that the prior information is in agreement with the 
current estimates. 

Today controlled instrument calibration procedures, replication and 
reproduction of observations with different techniques, as well as 
investigation of different models which fit the data, can guarantee to a 
certain extent the validity of the first two assumptions. However, 
possible incompatibilities between prior information and the current data 
are harder to control and modify. This situation is much more likely to 
occur in crustal movement analysis simply because the previous data 
may represent tectonic motions only in the long term average sense or 
it may be a result of purely theoretical considerations. The existence of 
these discrepancies can be detected by null-hypothesis testing. 
However, since the rejection or acceptance of the agreement between 
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data and extraneous information is also a result of established 
confidence levels and some additional assumptions, the rejection of the 
null-hypothesis by no means suggests that such information is totally 
wrong and should be completely discarded from the analysis. 

Therefore, the additional question: "How incorrect can the prior 
information be so that the resulting estimation with prior information is 
still advantageous compared to GLSE?" is examined by deriving the 
necessary conditions for the proposed estimators under biased prior 
information. A testing procedure is first given for the compatibility of 
prior and sample information. 

In this chapter the following linear model is under consideration 

y = Ax + u (la) 

where y is an n*l vector of random variables (differences of observed 
quantities at different epochs). A is an nxm fixed design matrix of full 
rank, x is the mxl vector of unknown parameters and deterministic in 
nature. In other words the lineeur model (la) is the classical functional 
relationship model of geodesy. m<n for an overdetermined system, u is 
the nxl vector of disturbances which are random and unobservable with 

the following assumed properties 

u " Iou> 0 (lb) 

where is the nxn known covariance matrix of disturbances. In 

addition we will assume that an mxl vector of prior information x© is 
available about the parameters x in the following stochastic form 

Xo = X + e (2a) 

where e is the mxl vector of disturbances with the following 
distributional properties 

e ~ (0,Xee). Xee > 0 (2b) 

where is the known covariance matrix of disturbances. Also we will 
consider the case where Xo is the only information available about the 
parameter vector x in a deterministic form (i.e., when 
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Under this setup, (l)-(2), the general problem is the estimation of 
the true parameter vector x in the presence of prior information. 


4.2 COMPATIBILITY TESTING 

Once the model of deformation is identified (discriminatory analysis), the 
next step is to check whether prior and sample information (i.e., the 
estimates obtained from the sample observations only) are in agreement 
with each other. This step can be used as a supporting factor to 
decide if the prior information should be included in the estimation 
procedure. 

Let Xo denote the mxl vector of prior information on the deformation 
parameters and let Xq = x + e where e is the mxl error vector, Igg its 
known covariance matrix and x the vector of true parameters. Then the 
hypothesis Hot "Prior and sample information are in agreement", can be 
tested following the similar lines given in (Theil, 1963). 

Under this hypothesis, if prior information and an estimate of 
parameters of GLSE type Xg are in agreement, then their difference is 
expected to be close to zero, 

(5:= Xo - Xg = e - N“^A Z“iu (3) 

where N:= is the nxn positive definite covariance matrix of 

observations, u is the corresponding nxl error vector, and A is the nxu 
design matrix and assumed to be full rank. The matrix of second 
moments of this difference is 

E(fi(5') = (Egg + N-^) (4) 

since Eee + > 0, (positive definite) E(6fi') can be expressed as 

E(6fi') = (5) 

where Z:= (Egg + N“')* Then it can be shown that 

fi'Z-’i ~ (0,1). (6) 
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If the disturbance terms on sample observations and prior 
information are assumed to be distributed normally, i.e., 

u ~ N( 0 ,Zuu) 

e ~ N(0,Zee) 

then 

6'Z-^ ~ N(0,I). (9) 

Therefore, the scalar 7 := 6'Z~^6 can be used as a test statistic since it 
follows a central x* distribution (Theorem A.IO). 7 is called the 
compatibility statistic (Theil, 1963). If o( is a predefined error 
probability of the first kind, then the null-hypothesis will be accepted 
for 7 < xi-oc 

In the case where Xq is assumed to be strictly correct ilgg •* 0), the 
testing procedure is performed using the compatibility statistic which 
would have the following form 

7 := (Xo - Xs)'N(xo - Xs) ~ Xm (10) 


(7) 

( 8 ) 


4.3 COMPARISON CRITERION FOR ESTIMATORS 

In comparing two estimators, it is necessary to weigh the advantages of 
both estimators against their disadvantages. Obviously, such 
preferences can be expressed in various ways. Each can be best for 
different purposes and a unique universal criterion unfortunately does 
not exist for all conceivable situations. Manipulative conveniences of 
possible alternatives and intuitive arguments about their properties are 
important in choosing a comparison criterion. 

Here the Mean Square Error (MSE) matrix criterion is considered to 
express qualitative properties of the estimators. This is partly because 
it gives fairly simple results but mostly because all unbiased estimators 
turn out to be biased when even one of the assumptions of 
unbiasedness is violated in practice (Bibby, 1977). This property is 
especially important in crustal movement analysis when additional 
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information is introduced into the estimation procedure (we consider 
additional information on the parameters in this study)'. Since there is 
no way to guarantee that the additional information is strictly correct, 
the effect of this information must be exploited for different estimators. 
Therefore an estimator which is not dominated by the erroneous prior 
information (i.e., robust against errors in this information) of its 
alternative can be considered a better estimator. 

Also, we expect that this estimator should improve the results in 
some sense; otherwise there is no reason to introduce additional 
information into the analysis. The MSB criterion will be used to exploit 
these types of properties of different estimators. 

This choice, however, can be criticized due to the fact that the MSB 
matrix involves a bias vector which is in practice rarely known and 
moreover if known can be eliminated. But as it is readily demonstrated 
in the following sections in comparing different estimators, an unknown 
bias may still be used as an effective tool in making inferences about 
the properties of different estimators. 

The comparison criterion is defined as follows. 

Definition 1 ; Matrix valued Mean Square Brror criterion. An estimator 
Xi of X is said to be better than another estimator Xj of x if the 

difference 

MSE(x 2 ) - MSE(x,) (11) 

is positive semi-definite (non-negative definite) or equivalently (> 0), 
where 

MSE(xi) = E(x - Xi)(x - X,)' (12) 

MSE(x 2 ) = E(x - X 2 )(x - X 2 )' (13) 

Now let one of the estimators be chosen as the Generalized Least 

Squares Estimator (GLSE) Xg of x which is obviously independent of 
prior information and is assumed to be unbiased. Let another estimator 
X of X belongs to a class of biased estimators which employs prior 
information or much more importantly it may be biased as a result of 
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erroneous prior information. Therefore, the corresponding MSE matrices 
are given by 

MSECXs) = D(xJ (14) 

MSE(x) = D(x) + bb' (15) 

Here, D(Xg) and D(x) are the mxm dispersion matrices of Xg and x 
respectively and b is the m*l vector of bias. Hence, under Definition 1, 


the estimator x which uses prior information is uniformly better than 
the sample estimator Xg of x if 

MSE(Xg) - MSE(x) = D(Xg) - D(x) - bb' > 0 (16) 

Since 

tr MSE(Xg) = E(x - xg)'(x - Xg) (17) 

tr MSE(x) = E(x - x) ' (x - x) (18) 

the ordinal criterion of betterness given by (16) leads to the following 
cardinal criterion 

b'b < tr[D(Xg) - D(x)] (19) 


The advantages of using the mean square error concept as a 
comparison criterion should now be clear. (19) indicates that the biased 
estimator x of x is superior to the sample estimator Xg of x of GLSE 
type if its total bias (b'b) is less than the total decrease in variances 
of all estimates. In other words, the use of prior information ia 
justified as long as the total bias introduced remains less than the total 
improvements of all parameters. This property forms the logical basis in 
deriving the conditions for the selection of suitable estimators under 
possibly incompatible prior information. 

As we shall see in the following chapters, the mean square error 
matrix of an estimator usually depends on the true parameter vector x. 
Consequently, Definition 1 does not exclude the possibility that for some 
values of x the mean square errors of both estimators will coincide (i.e., 
tr MSE(xi) = tr MSE(xa)). In these particular cases, there is no basis 
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for preferring one of the estimators over the other, although Definition 
1 is still fulfilled. Other properties of the estimators need to be 
considered. 

In the following sections three estimators which employ prior 
information, namely Best Linear Estimator (BLE), Best Linear Unbiased 
Estimator (BLUE) and Best Linear Uniformly Unbiased Estimator (BLUUE), 
are examined using the MSE matrix criterion. They are compared 

against GLSE under correct and possibly wrong prior information. 


4.4 BEST LINEAR ESTIMATION WITH PRIOR INFORMATION 

Theorem 1 : Best Linear Estimation, BLE, (Tautenburg, 1982; Schaffrin, 

1983). Let Gy be a linear estimator of x in the linear model (1). Then 
the optimum value of G (in the sequel the optimum value of G is denoted 
with the same letter to simplify the notation) for which 

E(x - Gy)'(x - Gy) = tr Gl„uG' + tr(I - GA)xx'(I - GA) ' = min (20) 

G 


yields to the BLE x, of x 

x,:= Gy = (1 + x'Nx)~*xx'A' luiy (21) 

where 

N:= A'niA (22) 

G = (1 + x'Nx)“^xx'A'Iui (23) 

The dispersion matrix D(xi) of Xi is given by 

D(Xi) = (1 + x'Nx)~*x'Nx'xx' (24) 

Since bias vector b 

b:= X - E(xi) = (I - GA)x (25) 


is not enforced to be zero in the target function given by (20), the 
resulting estimator is biased and the bias, by substituting (23) into (25), 
is 
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b = (I - GA)x 

(26) 

b = [I - (1 + x'Nx)~^xx' A'XuiA]x 

(27) 

b = + xx')“*x 

(28) 

and the mean square error matrix MSE(xj) 
considering (24) and (28), is 

of the estimated parameters, 

MSE(xi) = D(xi) + bb' 

(29a) 

MSE(xi) = (1 + x'Nx)-»xx' 

(29b) 

Now let us compare this estimator against 
GLSE type. The following theorem holds: 

the sample estimator Xg of 

Theorem 2: BLE Xi of x is better than the samole estimator x^ under 

Definition 4.3.1. 

Proof: Consider 


MSE(Xs) = 

(30) 

and 


MSE(xi) = (1 + x'Nx)-^xx' 

(31) 

then 


A:= MSE(xs) - MSE(xj) = W* - (1 + x'Nx) 

"‘xx' (32) 

4 > 0 iff 


(x'Nx)(l + x'Nx)”^ < 1 

(33) 

(Theorem A.l). Since x'Nx > 0, (33) is always less than one for 

0 < x'Nx < «. This completes the proof. 

Although this estimator is better than the sample estimator Xg, it 
contains the true parameter vector x and it is applicable only if a valid 
prior information x© about the parameters x such that Xo = x is 
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available. This information, however, can not be strictly realized in 
practice. If it is, then there would be no need for the estimation of 
parameters. Consequently Xo to be used in (21) and (29) to replace x in 
these relationships is in general different from the true parameter 
vector (i.e., x© * x). In this case it is necessary to compare this 
estimator versus the sample estimator when the wrong prior information 
is used in the estimation. 

Substituting x© for x in (21), the resulting new estimator is 
given by 

Xi = G©y = (1 + XoNx©)“^x©x©A'i;“5y (34) 

where 

G©:= (1 + xiNx©)“^x©x©A'Iui (35) 

and X© * X. The dispersion matrix D(3Ii) of Xi is now given by 

D(xi) = G©IuJG© = (1 + x3x©)~*x©Nx© -XoX© (36) 

and the bias b© is 

b©:= X - E(xi) = (I - G©A)x 

(37) 

b© = N”^(N~* + x©x©) ^x. 

The mean square error matrix of x, now has the following form 

MSE(xi) = D(xi) + b©b© = (1 + XoNxo)~^x©Nx© -x©Xo + N“*(N“^ + x©x©)“‘ 

• xx'(N“^ + x©x©)-^N“‘ (38) 

Comparing (38) with the MSE matrix of the sample estimator Xg which is 
obviously not affected by any type of prior information results in the 
following relationship 

MSE(Xg) - MSE(xi) = N“* - D(x,) - b©b© (39) 

= N~^ -(1 + XoNx©)“*x©Nx© -x©Xo - N”‘(N“* + x©Xo)~‘ 

•xx'(N“^ + x©Xo)~‘N“‘ (40) 
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It can be shown by using Theorem A.l that the difference of the first 
two terms in the above equation is always positive definite. Now let 

B:= - D(xi) = - (1 + XoNxo)“^XoNxo * XqXq (41) 

then using Theorem A.l again, (39) is p.s.d. iff 

x'N”^(N”^ + XoXo)'”^B“^ (N”^ + XoXo)”^N^^x < 1 (42) 

This expression reduces after some simple but lengthy manipulations to 

x'(N“^ + 2xoXo)“^x < 1 (43) 

Hence the following theorem is proven. 

Theorem 3 : The necessary and sufficient condition for the biased 

estimator Xi under wrong prior information to be better than the sample 
estimator Xg for all Xo ^ x with respect to Definition 4.3.1 is 

x'(rP^ + 2xoXo)‘'^x < 1 (44) 

The implications of this theorem will be examined at the end of this 
chapter together with the other theorems which will be derived in the 
following sections. 

Let us now examine the behavior of bias introduced by the \jse of 
wrong prior information when the above condition holds. The quadratic 


bias, from (37), is given by 

b;bo = x'(fT' + xoXo)“'rr^fr^(fr^ + xoXo)'”'x (45) 

which can be expressed, after a simple transformation, as 

bobo = z"(I + ZoZo)'“^N“^ (I + ZoZo)”^z (46) 

where 

z:' N^x (46a) 

Zq:= N^Xo (46b) 
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Hence, the condition for betterness under erroneous prior information 
(Theorem 3) takes the following form 

z'(T + 2zoZo)~*z < 1 ^47) 

The following corollary holds. 

Corollary 1 : The quadratic bias due to the incompatible prior 

information under the condition of betterness (44) is bounded. The 
supremum of the total (quadratic) bias as a result of the wrong prior 
information is equal to or less than the largest eigenvalue of N“*, 
i.e., 

sup z'(I + ZoZo)“^N“*(I + ZoZo)“^z < (48) 

s.t.: z'(I + 2zoZo)~*z < 1 (49) 


Proof : Let N~* = FAF' be a spectral decomposition of the symmetric 

matrix N“* and also let 

d: = F(I + ZoZo)“^z (50) 

then 

d'd = z'(I + ZoZo)~*(I + ZoZo)~*z (51) 

so that 


sup bobo = sup d'Ad = sup j X^df 

i = i 


(52) 


where Xj are the eigenvalues of N *. If these eigenvalues are written 
in descending order then the above equation (52) satisfies 

sup I Xjdj < X,„ sup I d< (53) 

1=1 1=1 


Using Theorem A.7, it can be shown that 


z'(I + 2zoZo) *z - d'd > 0 


(54) 
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therefore 


X„ sup Z d? < sup z' (I + 2zqZo) ‘z < X„ (55) 

i = x 

since z'(I + 2zoZo)~‘z < 1. This completes the proof. 

It should be noted here that the boundedness in the quadratic bias 
also implies the boundedness of its elements (Voievodine, 1976). 


4.5 BEST LINEAR UNBIASED ESTIMATION WITH PRIOR INFORMATION 
In this section another estimator, namely Best Linear Unbiased Estimator 
(BLUE), which is recently derived by Schaffrin (1983), is examined. The 
estimator and the relevant statistics are stated. Since the purpose of 
using such an estimator, in the scope of the proposed algorithm, is to 
improve the estimates, it is compared against GLSE. This estimator, 
similar to BLE, requires additional information x© about the parameters 
to be estimated. Comparisons are also made for the case when the prior 
information is wrong. The main difference between BLE and BLUE is 
that the unbiasedness condition is weakly introduced into the target 
function given by (20) which appears to be a novel in statistical 
literature. 

Theorem 1 : Best Linear Unbiased Estimation, BLUE, (Schaffrin, 1983). 

Let Gy be a linear estimator of x in the linear model (1). Then the 
optimum value of G for which 

tr D(Gy) = tr G (56) 

G 

subject to 

b:= X - E(Gy) = (I - GA)x = 0 (57) 

in the sense of, 

tr[GZu„G' - 2(1 - GA)xX'] = min (58) 

G 
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where X is a Lagrange multiplier vector and D(Gy) is the dispersion 


matrix of Gy, yields to the BLUE X 2 of x 

X 2 := Gy = (x'Nx)~‘xx'A'Iuiy (59) 

where 

G = (x'Nx)“*xx' A' Z 3 I (60) 

N:= A'l-JA (61) 

The dispersion matrix D(x 2 ) of X 2 is given by 

0 (^ 2 ) = (x'Nx)“‘xx' (62) 

Since the bias b 

b: = X - E(x2) (63) 


is enforced to be zero in the target function (56), the resulting 
estimator turns out to be unbiased, that is 

E(x 2 ) = e[(x'Nx)“*xx' A' luiy] = (x'Nx)~‘xx' A'l^jAx = x (64) 

Therefore, the mean square error matrix of the estimate is 

MSE(x 2 ) = D(xa) (65) 

Since the unbiasedness condition is introduced through (57) which 
has a solution only for the scalar multiples of x, the resulting estimator 
is weakly unbiased (see (Schaffrin, 1983) for the general concept of 
unbiasedness). If the condition of unbiasedness is introduced through 
the homogeneous equation (I - GA) = 0 which obviously holds for any x, 
then the resulting estimator is the well-known best linear uniformly 
unbiased estimator (or equivalently GLSE) which is independent of prior 
information. 

Note that this estimator reduces to GLSE for the univariate case 
(i.e., m = 1). 
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Proposition 2 : BLUE Xq of x is better than the sample estimator Xg of x 

with respect to Definition 4.3.I . 


Proof : Consider the differences of the dispersion matrices of both 


estimators 

A:= D(xg) - D(xa) (66) 

then from (62) 

A = N“* - (x'Nx)”*xx' (67) 

which can also be written as 

A = rr^[l - N^(x'Nx)-^xx'N^]N~’i (68) 

therefore A > 0 iff 

I - (x'Nx)-^N’^xx'N^ > 0 (69) 


which holds as a result of Theorem A.l. This completes the proof. 

Similar to BLE, this estimator is a function of the true parameter 
vector X and it is applicable if valid prior information x© ® x (x© is 
proportional to x, i.e., x© = c*x where c is a real number) is available. 
Again this condition cannot be achieved in practice. Therefore the 
resulting estimates are no longer unbiased since x© ft x (x© is not 
proportional to x) in general. This biased estimator is now given by 

X 2 = G©y = (xoNx©)“‘x©x©A'I“iy (70) 

where 

G© = (xoNx©)“*x©x©A'Z“* (71) 

with X© ^ X. The dispersion matrix D(% 2 ) of X 2 is, from (70) 

D(x 2 ) = (x©Nx©)“‘x©Xo (72) 


57 



and the bias bo caused by the wrong prior information is 


bo:= X - E(x2) 


(I - GoA)x = 


XoXoN 

x^NxoJ’^ 


( 73 ) 


It is clear that the dispersion matrix of the estimate does not fully 
measure the effect of the bias caused by the introduction of erroneous 
prior information. We therefore consider Definition 4.3.1. Considering 
(72) and (73) 


4:= MSE(Xg) - MSE(xa) = D(Xs) - D(xa) - bobo 
A = B - BNxx'NB 


(74) 


where 

B:= N~» - (x;Nxo)~‘xoxi (75) 

Since B > 0 (Theorem 2), it can be represented as B = B^B^ (Theorem 
A.2), then 

A = B’^d - B%xx'NB’^)B’4 (76) 

and this expression is p.s.d. iff 

I - B^xx'NB^i > 0 (77) 

Using Theorem A.l, (77) is p.s.d. iff 

x'NBNx < 1 (77a) 

which can also be expressed, substituting (75), as 


x'Nx 


(x'Nxq)^ 
XoNxo ^ 


(78) 


To explore the behavior of the above inequality under varying wrong 
prior information, consider the extrema of the second quadratic form 
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(x'Nxo)* 

xiNxo 

Xo 


= x'Nx 


(79) 


as a result of Theorem A.3. This is the case when the prior information 
is correct. On the other hand 


. (x'Nxo)* . ^ ZoCZo 
inf —7Z7~. = inf — = X„ 


Xo 


XoNxc 


2o 


ZqZc 


(80) 


as a result of Theorem A.4, where 

C:= N’^xx'N’i (81) 

Zo = N^Xo (82) 

and is the minimum eigenvalue of C. Since rank(C) z 1, X„ = 0. 
Therefore 


inf 

Xo 


(x'Nxq)^ 

XoNxo 


= 0 


(83) 


which, together with (79), leads to the following theorem. 

Theorem 3 : A sufficient condition for the biased estimator Xj to be 

better than the sample estimator Xg for all Xo x with respect to 
Definition 4.3.1 is 

x'Nx < 1 (84) 

Corollary 1 ; If u N(0,<r2l), then a sufficient condition for the biased 
estimator X 2 to be better than Xg for all Xo ^ x is 

^ x'A'Ax < 1 (85) 

which holds basically when the signal to noise ratio 


m xf 

i = i 


( 86 ) 
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is sufficiently small. 

Again leaving further implications of the above results to the end of 
this chapter, let us examine the amount of bias committed by the use of 
incompatible additional information. From (73) and (82) the quadratic 
bias is given by 


bnbo - z" 


I - 


ZqZ( 

ZoZ 


N-HI - 


O'^OJ 


ZqZq 

ZqZo, 


(87) 


and the condition of betterness (84), after a simple transformation, is 


z'z < 1 


where z:= N^*x. Now observe that 


I - 


ZfiZr 


ZnZr 


I - 


ZfiZf 


ZnZf 


- I T ^ 0^0 1 
^ ZoZo ^ 


( 88 ) 


(89) 


and 


sup z' 
Zo 


I 


ZqZq 

ZqZo. 


z 


z'z 


(90) 


Considering the spectral decomposition of N“* 
ir‘ = CAC' 


(91) 


where C is an orthogonal matrix and A is the m * m diagonal matrix of 
Aj eigenvalues of N“*, equation (87) can be written as 

bobo = d'CAC'd (92) 


where 


d' - 


z' 


I 


ZqZq 

ZqZq 


Now, from (92) 


(93) 


bobo = I Aid? 
1 = 1 


(94) 
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then 


sup bobo = sup I \<df < sup J, d? (95) 

2o <=* *=' 

where X„ is the largest eigenvalue of N”‘. Substituting (89) and (90) 
into the last term in (95) and considering (88) 

sup bobo < X„ (96) 

z'z < 1 

is obtained. Therefore, the following corollary is proven. 

Corollary 2 : The total bias due to the wrong prior information is 

bounded under the condition of improvement given by (84). The 
supremum of it is equal to or less than the largest eigenvalue of N"*, 

l*0t y 


sup bibo < X„ (97) 

s.t. : x'Nx < 1 


4.6 BEST LINEAR UNIFORMLY UNBIASED ESTIMATION WITH PRIOR INFORMATION 
In the previous two sections, the two estimators examined require prior 
information on the parameters to be known in an exact sense. There 
may be situations, however, that an independent "stochastic" auxiliary 
information vector Xo on x is available, i.e., 

Xo = X + e 

e ~ (0,Zee) . Zee > 0 (98) 

where e is the mxl vector of disturbances on x© and Igg is the mxm 
covariance matrix. The availability of this information leads to the 
following mixed model 

y = Ax + u 
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(99) 



where 


y: = 




( 100 ) 







_ 

_ 


u 



0 


Zuu 

0 

r — 





9 



~ • ^uu 

e 



. 0 . 


0 

Zee . 



( 101 ) 


Here y, A, x, u and Xuu have the same meaning as defined in the linear 
model given by (1). The problem of estimation in this mixed model is 
well known. In this section first the Best Linear Uniformly Unbiased 
Estimation (BLUUE) technique is summarized. It is then compared 
against the GLSE. Results are due to Terasvirta (1979). However, the 
amount of bias committed by the use of extraneous information given by 
Corollary 2 seems to be novel. 


Theorem 1 : Best Linear Uniformly Unbiased Estimation, (Theil and 

Goldberger, 1961). Consider the linear model given by (98) - (101). Let 
Gy be a linear estimator of x in this model, then the optimum value of G 
for which 


tr D(Gy) = tr GlyuG' = min (102) 

G 

s.t. : (I - GA) = 0 (103) 

for all X, yields to the BLUUE X3 of x 

X 3 = (I + IeeN)~HZeeA'Zuiy + ) (104) 

whose dispersion matrix is given by 

D(X3) = (lek + N)-‘ (105) 

where N:= A'Xj^A. 


Since the effect of additional information is expected to improve the 
estimates, the following theorem is of interest. 
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Theorem 2 ; The BLUUE X 3 of x is a better estimate Xg of x in the sense 
of Definition 4.3.1. 

Proof ; Since both estimators are unbiased, Definition 4.3.1 turns out to 
be a variance comparison. From (105) 

D(Xg) - DCxa) = N-^ - an + N)-» (106) 

This expression is p.s.d. as a result of Theorem A.7. This completes the 
proof. 

The unbiasedness assumption, however, can hardly be realized in 
practice. The compatibility of auxiliary information with the information 
implied by the observations is especially difficult to achieve. Since the 
current observations are performed under controlled circumstances, it is 
expected that incompatibility is more likely to be due to the additional 
information. 

Assume that the true but unknown form of the additional information 
on the parameters is in fact 

Xo = X + s + e (107) 

where s * 0 is an mxl vector of possible deviations (systematic errors) 
and deterministic in nature. This in turn implies that, from (104) 

X 3 = (I + IeeN)~UZeeA'i:uiy + Xo) (108) 

where Xo is now given by (107). Since 

E(xo) * X (109) 

this estimator is no longer unbiased, and it will be denoted by to 

emphasize the bias . It is easy to show, using (107) and (108), that 

E(X3) = X + (I + ZeeN)-»S (110) 

Hence the last term is the bias caused by the wrong prior information 

b = (I + EeeN)"‘s (111) 
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Now the next question is, "under what conditions can wrong prior 
information still be used to improve the estimates?". Following Definition 
4.3.1, the biased estimator X 3 dominates the sample estimate Xg iff 


MSE(Xg) - MSE(x3) > 0 

( 112 ) 

tr[MSE(Xg) - MSE(x 3 )] > o 

(113) 

tr[D(Xg) - D(x 3 )] - s'B'Bs > o 

(114) 


where B:= (I + IeeN)“‘* Now using Theorems A.l and A.9, (113) and 
( 112 ) hold iff 

s'dee + < 1 (115) 

This form leads to the following corollary. 

Corollary 1 : (Terasvirta, 1979). A necessary and sufficient condition for 
the biased estimator X 3 to be superior than the sample estimator Xg in 
the sense described by Definition 4.3.1 is 

s' dee + < 1 (116) 

Let us now examine the amount of bias introduced by the use of 
extraneous information under the above improvement condition. 

Corollary 2 ; The total bias due to the wrong prior information under 
the condition of improvement (116) is bounded. The supremum of it is 
equal to or less than the largest eigenvalue of (I + Nlee)“‘N“^, i.e., 

sup b'b < X,„ 

Xo 

(117) 

S.t.: s'dgg + N“‘) *s < 1 

Proof ; After a simple transformation the total bias, from (111), and the 
condition of improvement, from (116), can be written as 

b'b = Zoilee + N-M'^^N-'N-^Zee + N~‘ )"^Zo (118) 
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ZoZo < 1 


(119) 


where Zo'.- s'(Zge + then, 

sup b'b = 

Zo 

( 120 ) 

s.t.: ZoZo < 1 

as a result of Theorem A44, where X„, is the largest eigenvalue of 

(Zee + N-M"’^N-»N-i(Zee + (121) 

Since 

tr[(Zee+N“')"’^ir^jr»(Eee+N-M"’^] = tr [ ( I+Nl^e )"^N"‘ ] = ? X, (122) 

1 = 1 

the largest eigenvalue of (122) is equal to the largest eigen- 
value of (I + NJlgg)~*N~^. This completes the proof. 


4.7 FURTHER DISCUSSIONS 

In the previous sections, three different estimators which employ prior 
information were examined. Since the purpose of using prior information 
about the parameters is to improve the sample estimates (i.e., the 
estimates which are obtained without priors) according to the proposed 
algorithmic approach, they were compared against the GLSE technique in 
order to determine the effect of this information. Comparisons were 
made using the MSE matrix criterion. It was shown that introduction of 
prior information improves the results according to the MSE matrix 
criterion (except BLUE which reduces to GLSE in the case of univariate 
application) for the case when prior information is strictly correct. 
Comparisons were also made when prior information is not compatible 
which is the case in practical applications. Improvement conditions over 
GLSE were derived for this case. Results are summarized in Table 1. 

In this section these results are interpreted within the scope of the 
proposed algorithmic approach. First these estimators are compared 
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TABLE 1: ESTIMATORS WITH PRIOR INFORMATION 
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IMPROVIMBNTS 































with respect to each other under correct prior information; then it was 
shown that direct comparisons are not possible when prior information is 
not compatible. In this case their corresponding improvement conditions 
can be used to select a suitable estimator. 

Now consider BLE, from (21) 

ill = (1 + x'Nx)“* xx'A'Zuu y (123) 

This estimator is better than GLSE (Theorem 4.4.2) but not practical as 
it is, since it is a function of the true parameter vector x. To overcome 
this difficulty several approaches are given in statistical literature. For 
instance, Farebrother (1976) replaces x with kg of GLSE type. Vinod 
(1976) iterates the Farebrother estimator and gives an analytic solution 
to this iterative form. Another estimator results by postulating the 
linear model (1) as y : Ax + ku where k is an arbitrary number and 
modifying (123) accordingly. Ullah and Ullah (1978) introduce in (123) 
two constants ki, k 2 where ki>0, ka are arbitrary stochastic or 
nonstochastic scalars (double k-class estimators). They also show that 
for particular values of ki and ka this new estimator reduces to a 
James and Stein Estimator (1961). 

Common in all the above approaches is the use of sample 
information. However if valid prior information X n which is independent 
of the current observations is available about the true values of the 
peirameters, then (123) can readily be exploited by substituting Xo for x. 
In this case the usefulness of the resulting estimator depends on how 
close Xo is to x. In the meantime since the purpose of introducing prior 
information, according to the proposed approach, is to improve the 
estimates, the introduction of prior information is not only helpful in 
reducing the MSE of estimates in an algebraic sense but also meaningful 
in the sense that this information may contain a component of reality 
which is not captured by the sample observations. 

If this information x© is available, then (123) can be written as (we 
denoted this estimator as when x©=x, and Xj when x© ^ x in the 
previous sections) 
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= (1 + Xo'Nxo) ‘ XoXo' A'lui y 


(124) 


(125) 

(126) 

which indicates that this estimator provides a shrinkage of the sample 
estimator Xg of GLSE type, and since 

E(Si) = o< • X (127) 

it also underestimates the true parameter vector on average, and it is 
therefore a biased estimator. Yet it is better than the GLSE (Theorem 
4.4.2) provided that Xo is strictly correct (i.e., Xo = x). Obviously this 
information is not known strictly in practice. In this case (when Xo t 
x), uniform improvements over GLSE by using x© cannot be guaranteed. 
Nevertheless it is still useful if the general improvement condition which 
was derived in section 4.4 holds. The following considerations lead to 


simpler interpretations. From (44) 

x' (N"‘ + 2xoXo')"'x < 1 (128) 

Since 

N - (N“^ + 2xoXo')"‘ >0 for Xo 5 ^ 0 (129) 

an upper bound for (128) with respect to Xo is 

x'Nx < 1 (130) 

For the sake of simplicity assume that disturbances are 

u ~ (0, <ru*I) (131) 


Substituting (131) into (130) and using the spectral decomposition of 
(A'A)“S (130) can be replaced by 


For the univariate case (124) reduces to 

Si = OCSg 

where 

o< := 1 + ~ , 0<o(<l for all Xo 

^ Xq n J 
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2 


(132) 


. ^ v: 

where Xj are the eigenvalues of (A'A)~‘, are the elements of t = C'x. 
This expression indicates that the prior information will improve the 
results when 

(i) The signal-to-noise ratio (x'x/<T^J^) is sufficiently small. 

(ii) The component of signal x is not very pronounced when X^ is 
small. 


Consider now BLUE given by (59) and which can also be expressed 
as 


Xa 


xx'N A 
x'Nx 


(133) 


Since the elements of xx'N /x'Nx are bounded, BLUE displays similar 
properties as BLE. Again prior information x© about x needs to be 
known in order for this estimator to be applicable. In addition, it is 
unbiased if Xo is proportional to x (x© « x). However, it reduces to 
GLSE for the univariate case and thereby offers no improvements. If 
BLUE is compared to BLE when Xo=x under Definition 4.3.1 


MSE(xa) - MSE(xi) = D(Xa) - MSE(xi) = (xo'Nxo)-‘(l + Xo'Nxo)“‘xoXo' 

(134) 

is obtained. This expression is p.s.d. due to its quadratic nature. 
BLUE is therefore not as efficient as BLE over GLSE under strictly 
correct prior information. On the other hand, if this information is not 
strictly correct then it is rather difficult to establish the superiority of 
one estimator over the other by examining their corresponding MSE 
matrices. From (38), (39) and (68), (69). 

MSE(x 2 ) - MSE(xi) = D(X 2 ) - D(x,) + bjbj - b,bi (135) 

Since the matrices which appear in (135) are of rank one and they 
are not a linear combination of each other, (135) is indefinite in general. 
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In this case improvement conditions for both estimators are more 
informative. Considering (84) and (44) these are 

x'Nx < 1 for BLUE (136) 

x' (N”^ + 2xoXo')~’ X < 1 for BLE (137) 


Since the difference (136)-(137) is larger than zero for x© 0 as a 
result of Theorem A.7, the improvement condition for BLE is 
comparatively easier to achieve than the improvement condition for 
BLUE. 

Let us now consider a general property of shrinkage estimators 
which can also be observed in BLE and BLUE. These estimators can be 
expressed in the following forms respectively 


= di'Xo 
= d2*Xo 

where 

d .= X/N51, 

* ' 1 + Xo'Nxo 

^ ‘ Xo'Nxo 


(138) 

(139) 


(140) 

(141) 


and Xg is the GLSE of x, Xo is the prior information vector about x as 
previously defined. Since di and d 2 are scalar quantities which operate 
on Xo in the above representations, the usefulness of BLE and BLUE 
depends closely on the relative magnitudes and directions of the 
components of x© with respect to the components of the sample estimates 
Xg (or equivalently x). If the elements of Xo are appropriate on these a 
priori grounds, BLE and BLUE give reasonable estimates. On the other 
hand, if the elements of Xo are different in magnitude and directions 
with respect to the elements of Xg, the null hypothesis testing 
procedure given in section 4.2 can easily detect these inconsistencies. 
This can be seen directly from the test statistics given by 4.2.10 as 


7 := (xo - (xo - ilg) ~ 


(142) 
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y will be larger when Xo is different than Xg in direction and magnitude. 
In this case it is dangerous to use these estimators except when 
deformations are below the noise level of observations as implied by the 
improvement conditions derived for BLE and BLUE. If these conditions 
hold, the direction and the magnitude of Xo are no longer relevant 
because Xg in (138) and (139) will shrink the estimates toward zero. 

Meanwhile improvements over GLSE can also be obtained if the prior 
information about the parameter vector to be estimated is stochastic in 
nature. In this case BLUUE can be used to introduce this information. 
This technique corresponds to the well-known "least square solution in 
the case of observation equations with weighted parameters" (Uotila, 
1980). In section 4.6 this technique was examined within the scope of 
the proposed algorithmic approach. It was shown that introduction of 
prior information improves the results with respect to GLSE. The 
condition of improvement under incompatible prior information was given. 
Results are due to the discussion paper of Terasvirta (1979), except 
Corollary 2 which is about the amount of total bias committed by the use 
of incompatible prior informationi is new. 

Corollary 1 provides additional insights about the properties and 
usefulness of this estimator. From (116) the improvement condition 
reads as 

s' (lee + N-‘)-‘s < 1 (143) 

the presence of s in the above condition indicates that the magnitude of 
systematic error vector (s) is important for (143) to hold rather than 
the amount of true parameter vector x in the case of BLE and BLUE. 
Therefore, this estimator is not robust against incompatible prior 
information. The stochastic nature of prior information, however, 

compensates for this disadvantage of BLUUE as we shall demonstrate 
using the following simplifying assumptions which also provide further 
insight about the properties of BLUUE under incompatible prior 
information. 
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If 

u ~ (0, ■ (144) 

e ~ (0, (145) 

k := ffu* / (146) 

where Wy* and are the a priori variances of the disturbances and 

the additional information respectively, the (143) reduces to 

<r-* s' [k-‘I + (A'A)~M~‘s < 1 (147) 

Using Theorem A. 5, (A'A)= CAC' , (147) can be written as 

I h < 1 (148) 

k~* + 


where t< are elements of t = C's and are the eigenvalues of (A'A)~*. 
Therefore, (143) under the assumptions given by (144) and (145), is 
likely to hold when 

(i) Bias-to-noise ratio (s's/<T^J^) is sufficiently small 

(ii) X< are large 

(iii) *Oj is not erroneous (i.e., Sj is not too large) in directions where 
there is little information (i.e., when Xj is small) 

(iv) k~* is large (i.e., when is sufficiently small or o-g* is large. 

Due to the stochastic nature of prior information a direct analytical 
comparison of BLUUE with BLE and BLUE which use prior information in 
a nonstochastic sense is not meaningful. Conditions such as (130) and 
(143) allow the experimenter to make qualitative inferences about these 
estimators. The following upper bounds for (130) which hold both for 
BLE and BLUE, and (143) which holds for BLUUE can easily be proven. 
Considering the minimum eigenvalues of (A'A)“* these are 

^ < 1 ni9) 

“^101 n ^11 

^ < 1 (150) 

Xmin 
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where X„nn is the minimum eigenvalue of (A'A)“*. 

Since and is the same in both expressions the preference 

of one estimator over the other depends on the other terms. If the 
magnitude of the parameter vector x is expected to be small enough (in 
the scope of crustal movement analysis this corresponds to very small 
deformations) with respect to the magnitude of possible biases (s) in Xo> 
BLE and BLUE are preferable over BLUUE. However, if there exist 
pronounced deformations and the observations are precise enough ( 149 ) 
cannot be fulfilled. Therefore the usefulness of these two estimators 
are limited to the amount of signals. 

On the other hand, the advantage of BLUUE over the other 
estimators lies in the fact that the experimenter can control the effect 
of systematic errors by the gradual reduction of its effect through its 
uncertainty crj in X3. In addition, the above expressions allow the 
experimenter to achieve this efficiency without extensive simulation 
studies. 

The advantages of using stochastic prior information can further be 
exploited by the introduction of the concept of random parameters. This 
interpretation opens the possibility of employing another group of 
estimators which use prior information about the expected values of 
stochastic deformation parameters. This is the subject of the following 
chapter. 
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Chapter 5 

PREDICTION USING PRIOR INFORMATION 


5.1 INTRODUCTION 

In our consideration of the improved estimation problem in the previous 
chapter, we have assumed that the deformation parameters to be 
estimated were fixed (deterministic) though prior information available 
about them was stochastic or fixed in nature. 

The interpretation and availability of stochastic prior information as 
well as its possible discrepancies with the information obtained from 
sample observations (for instance, the difference between relative plate 
velocities implied by the long-term average models of global tectonics 
and the ones which are deduced from the current local and regional 
strain accumulation measurements for the Pacific and North American 
plates indicated by (Lyzanza et al., 1985)) suggest that these parameters 
themselves may act as random variables which can take on different 
values at different places and at different times. Within this 
interpretation, if a realistic density function can be postulated for these 
random variables, the problem of improved estimation in the presence of 
prior information, which is in this case about the expected values of the 
parameters, can also be examined using the Bayesian inference methods. 
In other words, prior information with its prior probability combined 
with a likelihood function using Bayes* Theorem yield the estimates of 
deformation parameters and their posterior probabilities. 

Existing knowledge about the nature of crustal movements, 
unfortunately, is not sufficient (at least for today) to postulate a 
realistic density function about these deformation parameters. 
Nevertheless, the estimation problem with prior information can also be 
formulated independent of the density function of random deformation 
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parameters using the sampling theoretic approach. 

In this framework two approaches are identified. The conventional 
approach which estimates the mean (expected values) of stochastic 
parameters and the estimation of random variables themselves which may 
be relevant when the short-term realization of random deformations is of 
interest. Estimation of stochastic parameters is also called prediction 
(Rao, 1965; Schaffrin, 1983). 

In this chapter we formulate the improved estimation problem as 
follows. Deformation parameters are stochastic in nature. They exhibit a 
temporal and spatial behavior which fluctuates about an a priori known 
fixed mean and an a priori known dispersion. Although stochastic 
deformation parameters cannot be observed directly, they can be 
realized from another random vector (such as baseline differences) 
which can be obtained through observations (repeated baseline 
measurements). The estimation of random variables themselves combined 
with the prior information about their means is under consideration. 

Three prediction techniques, namely: Best Homogeneously Linear 

Prediction (HOMBLIP), Best Homogeneously Unbiased Prediction 
(HOMBLUP), and Best Inhomogeneously Unbiased Prediction (INHOMBLUP) 
which are proposed by Schaffrin (1983) for the estimation of random 
deformation parameters in the presence of prior information, are 
examined in this chapter. 

Since the purpose of introducing prior information in the proposed 
approach to crustal deformation analysis is to improve the estimation of 
random parameters, they are compared against a reference estimator 
which uses only observations. This predictor, as it is demonstrated in 
section 5.4, turns out to be GLSE. On the other hand, prior information 
about the expected values of random parameters are not known in 
practice with certainty. Therefore the effect of noncompatible prior 
information needs to be examined as we did in the previous chapter for 
the functional relationship linear model. 

Although the assumption of random deformation parameters seems to 
be intuitively plausible in crustal movement analysis, especially for the 
plate tectonics models where additional information about the plate 
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motion parameters are derived from data which spans over geological 
time intervals, it is also possible to check the validity of the following 
model using the principle of the likelihood ratio developed by Rao 
(1965). 

In this chapter the following linear model is under consideration 

y = Ax + u (la) 

where y is an nxl vector of random variables, A is an nxm (m<n) fixed 
design matrix of full rank, u is an n*l vector of disturbances which are 
random and unobservable, x is the mxl vector of unknown stochastic 
parameters, u and x have the following distributional properties 

u ~ (0,luu). Iuu>0 (lb) 

X ~ (m»Ixx)» Ixx> 0 (Ic) 

E(ux') = 0 (Id) 

where is the nxn covariance matrix of disturbances, /i and Z^x 
the mxl vector of the expected values of x and mxm covariance matrix of 
the random (stochastic) parameter vector x respectively. It is assumed 
that the random parameter vector x and disturbances u are statistically 
independent. As a consequence of (la)-(ld), the following relationships 
hold 


E(y) = Am (2a) 
E[y - E(y)][y - E(y)]' = (2b) 
E[x - E(x)][y - E(y)]' (2c) 


Under this setup, (l)-(2), we shall consider the problem of estimating 
(predicting) the random variable x in the presence of prior information 
about ^ and ^xx* 


5.2 COMPATIBILITY TESTING 

If the model parameters are interpreted as random variables with certain 
mean and variances available to the experimenter as a result of prior 
investigations, then this information and the information implied by the 
sample observations can be tested, as in the case of deterministic 
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parameters, to decide whether they are compatible. This step can be 
used to check the validity of prior information about the expected 
values of random parameters before they are introduced into the 
improved estimation procedure within the scope of the proposed 
approach. In this section a null hypothesis testing procedure is 
developed for this purpose. 

Consider the linear model given by (l)-(2). Let /lo denote the 
known prior information about the expected values of random parameters 
and Xxx ^he known positive definite mxm covariance matrix of x. 
Then the null hypothesis H© is 

Ho* A*o - E(x) (3) 

Lemma 1 : (Rao, 1965) Let p := Gy + d be an inhomogeneous linear 
estimator of := E(x) in the linear model (l)-(2). Then the optimum 


values of d, G for which 

i) E(// - At) = 0 for all values of /t (4a) 

ii) E(At - A) (m ~ A) " is a mininuin (4b) 

are 

d = 0 (5a) 

G = N"‘ A'l~l y (5b) 

which imply 

A := Gy + d = N~* A'Zui y (5c) 

The dispersion matrix of A i® given by 

D(A) = (Ixx + N-») (5d) 


where N := A'^ui^* 

In addition, if x N(Aa*, Zxx)» u ^ N(0,Zuu) and x and u are 
stochastically independent, as implied by (2), then 
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i) A = n->a'IU y " N(/i, + N-*) 

ii) A is inference sufficient for the parameter fx 


(6a) 


(6b) 

Therefore the best inhomogeneously linear estimator }1 ot fi with 
respect to the conditions (4a)-(4b) is the same as GLSE. But the 
dispersion matrix of the estimator is different (note that this estimator 
is independent of p). 

Proposition 1 ; The following quadratic form is x* distributed with m 


degrees of freedom as a result of Lemma 1 

(m - A)'(Zxx+N~M-‘ (m - A) " Xm" (7a) 

Proof : Let 

6 : = IX - /X (7b) 

then from (5c) and (la) 

6 = IX - X - W-^A'lZt u (7c) 

It is easy to show that 

6 ~ (0, + N~‘) (7d) 

Since Ixx + N“‘ > 0, it can be expressed as (Theorem A.2) 

Zxx + (7e) 

where 

Z^ = (Zxx + (8a) 

and 

Z~^ 6 ~ (0, I) • (8b) 

Using Theorem A. 11 

Z"’^ 6 ~ N(0, I) (8c) 
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and as a result of Theorem A, 10 


= (m - A)' (Ixx + (m - A) - Xm" (8d) 

This completes the proof. 

Now substituting for in (8d) we obtain a test statistic to check 
the validity of prior information /io about the expected values of x, 
before it is introduced into the estimation process* If <y is a predefined 
error probability of the first kind, then the null hypothesis Ho will be 
accepted for 6 'Z-‘<5 < Note the interesting structural duality 

with the testing procedure in section 4.2 of the functional relationships 
model. 


5.3 COMPARISON CRITERION 

The statistical theory used in the previous section for comparing 
different estimators can also be extended to cover the linear prediction 
problems. 

The comparison criterion is formulated as follows. 

Definition 1 : Let x be an m x 1 vector of random variables such that 

E(x) and let x denote the corresponding predicted values. The 

matrix valued Mean Square Error of Prediction (MSEP) of x is then 
defined by 

MSEP(x):=: E(x ~ x)(x - x)^ = D(x - x) + bb' (9) 

where b is the bias vector of prediction and is defined as 

b : = E ( X ~ X ) ( 9a ) 

(Schaffrin, 1983). Now the comparison criterion is formulated. 

Definition 2 (MSEP Criterion): A predictor Xi of the random variable x 

is said to be better than the another predictor X 2 of x if the difference 
matrix 
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MSEP(xj) - MSEP(xi) 


(10) 


is positive semi definite (p.s.d.)> where 

MSEP(xi) = E(x - x,)(x - Xi)' (11) 

MSEP(xa) = E(x - X 2 )(x - Xa)'- H.2) 

Consider now the following case. Let Xg be an unbiased predictor 
of X, and furthermore let this predictor employ only the sample data. 
Let X be another predictor of x, possibly biased. If Definition 2 holds, 
then 

b'b < tr[D(x - Xg) - D(x - x)] ' (13) 

where b is the bias vector of x. This property is similar to (4.19) and 

forms the logical basis of using MSEP criterion. Now, as in Section 4.3, 
it is necessary to define a reference predictor. This is the topic; of the 
following section. 


5.4 PREDICTION USING SAMPLE INFORMATION ONLY 

Consider the linear model described by (1) - (2) with the exception that 
no prior information is available about the random parameters. The 
following theorem defines the sample predictor. 

Theorem 1: Let Gy be a linear predictor of the random parameters x in 
the linear model described by (1) - (2). Then the optimum value of G 
(this value is denoted by the same letter for the sake of simplicity in 
the notation) for which 

E(x - Gy)'(x - Gy) = min •; 14' 

G 


subject to 

E(x - Gy) = (I - GA)m = 0 


(15a) 
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for all /i, that is, 

I - GA 0 
is 

G - (A'I-iA)-‘A'Z-i 

Xs:= Gy = (A'IuiA)“‘A'I“iy. 

The dispersion matrix of the predictor Xg is 

D(Xg) = D(x - Xg) = (A'ZuiAr^ 


Q5b) 


(16) 


(17) 


Proof : Considering (14), let the target function to be minimized be 

0:= tr[E(x - Gy)(x -Gy)' + 2A'(I - GA)'] (18) 

where A is a Lagrange multiplier matrix. Substituting (14) in (18) we 
obtain 


Q = tr[GZuu6' + 2A'(I - GA) ' ] . (If-)) 

Now minimizing Q with respect to G 

GZuu - A'A' = 0 (20) 

G - A'A'I'i (2J) 

and multiplying both sides of (21) by A on the right-hand side and 
considering (15) 


I = A'A'IuiA (22) 

A' = (A'I-iA)-‘ (23) 

is obtained. From (21) it follows that 

G = (A'Z-iA)-*A'I-i (24) 

and 


^s'~ Gy = (A'XuiA) 'A'Xuiy. 


(25) 
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This completes the proof. 


Observe that (25) is equivalent to GLSE. Hence, if no additional 
information is available, then the GLSE is also the best linear unbiased 
predictor in the sense of (15) and (19). This predictor can now be used 
as a reference predictor against which the effect of additional 
information that are used by alternative predictors can be compared. 


5.5 BEST HOMOGENEOUSLY LINEAR PREDICTION WITH PRIOR INFORMATION 
Theorem 1 : Best Homogeneously Linear Prediction, HOME LIP, (Schaffrin, 

1983). Let Gy be a linear predictor of x in the linear model (1) - (2). 
Then the optimum value of G for which 

E(x - Gy)'(x - Gy) = min (26) 

G 

yields to the HOMBLIP x, of x 

x,:= Gy = (I + IxxN)-»{ZxxA'Euiy + [d + /i'N/x)-‘M' [ (I + IxxN)']"’ • 


(27) 

where 

N:- A'Z'iA (28) 

N:= N(I + ZxxN)“‘ = (I + (29) 

This predictor is biased and the resulting bias is given by 

b:= E(x - Gy) = (1 + m'N/z)-‘(I + ZxxN)"V (30) 

The mean square error matrix of the prediction is 

MSEP(xj) = E(x - Xi)(x - X,)' 


= (I+ZxxN)"^Zxx + (l+M'NM)~dI+ZxxN)'VM'[(I+ZxxN)“‘]' (31) 
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Since our interest is in the gain of using prior information, the 
following corollary is of interest. 


Corollary 1 : HOMBLIP Xj of x is better than the sample predictor Xg 

under Definition 2. 

Proof: Let us again consider MSEP(xi) and MSEP(Xg) given by (31) and 

(17). Then 

MSEP(Xg) - MSEP(xi) = N-‘(I - ~ (I ~ lxx^)lxx 

- (1 + ,i'N/i)-*(I - ZxxN) mm' (I - TxxN)' (32) 

Considering the following identity 

ir* - /i(l + (33) 

(32) can be written as 

MSEP(Xg) - MSEP(Xi) = (I - IxxN)fr^(fr> + ;i;t')-‘N-‘(I - I,<xN)' (34) 

Since all the matrices within the parentheses are positive definite, (32) 
is also positive definite. This completes the proof. 

Consider now the HOMBLIP Xi of x rewritten in the following 
modified form 

xi ^ [BExx + Bm( 1 + M'Np)-V'B']A'Zuiy (35) 

where 

B:= (I - = (I + ZxxN)~' (36) 

As in the case of BLE, BLUE and BLUUE, this predictor is practical if 

the expected value of the stochastic parameter and its covariance matrix 
are both available as prior information. To give a more realistic validity 

to the prediction of crustal motion parameters, it would be logical to 

assume that this prior information is of the form ptot which is an mxl 
nonzero vector, and that it is different than the true value p, i.e.. 
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Mo * m:= E(x) 


(37) 


Such deviations are quite likely to hold in crustal movement analysis; 
for instance, due to the inhomogeneous properties of the earth's crust 
where the prior information is not fully representative of the long term 
averages at different areas. Another possibility for such deviations 
could be some systematic model or measurement errors which occur in 
the process of obtaining the prior information. However, it is likely 
that this information may contain a component of reality which can be 
used to improve the prediction of the random parameters in the sense of 
Definition 2 under controlled circumstances. These conditions are the 
topic of the following discussion. 

Consider the bias vector of the prediction when ju ^ Mo* From (9a) 

bo:= E(x - Xi) = (I - GoA)m (38) 

where 

Go = [bIxx Bmo(1 MoNmo)“VoB'] A'Z uu (3y) 

and using identity (33) the following expression for the bias vector is 
obtained 

bo = BirUN-i + MoMo)'’V (40) 

Let us now investigate the improvement conditions. When mo ^ M ^he 
predictor (27), which will now be denoted by x, , has the following form 

Xi:= Goy = (I + IxxN)"‘(^xxA'niy + [(1 + MoNmo)~Vo[(I + IxxN)']"‘ 

• A'IUiy]-po} (41) 

and its corresponding MSEP is 

MSEP(xi):= E(x - x,)(x - x,)' 

= Bixx Bmo(1 + a*oNmo)~*(moN/io)moB' + bob© (42) 
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Considering the MSEP of the sample predictor Xg 

MSEP(Xg) = D(x - Xg) = D(Xg) = S n-‘B (43) 

the corresponding difference matrix, under the general condition of 
betterness, Definition 2, is 

MSEP(Xg) - MSEP(x») = b{n-» - poU + (poNmo)po 

~ [I ~ /io(l + PoNpo)“‘moN) 

• [I ~ Po(l + MoN/io)“‘moN] '|b' (44) 

This expression is p.s.d. if the matrix differences within the curly 
brackets is p.s.d. Using Theorem A.l and considering 

[fr* - Pod + p;Npo)"=*(moNpo)po] > 0 (45) 

for all po» it can be shown that (44) is p.s.d. iff 

1 (-le) 

This expression can also be written, using Theorem A.9, as 

p'(N-i + 2poPo)”*M < 1 (47) 

This result proves the following theorem. 

Theorem 2 ; The necessary and sufficient condition for the biased 
predictor x, to be better than the sample predictor Xg for all po * p 
under Definition 2 is 

p'(N“‘ + 2poPo)~V < 1 (48) 

Now observe that 

N - (ir* + 2poPo)~* > 0 (49a) 

for all po, as a result of Theorem A.7. Therefore, 
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pt'Ufi < 1 


(49b) 


is an upper bound for (48). If now, u (0 ,<t 2I)» and x ~ (u,<r5/k • I) 
where k is a proportion constant relating the variances of u (<r^) and x 
(<r*), then (49) can be written as 

<r-V'[k-*I + (A'A)->]"V < 1 (50) 

This expression is an increasing function of k and it approaches zero in 
the limit 

lim {<r3V'[k"'I + (A'A)“Mm} = 0 (51) 

k 0+ 

Therefore the following corollary holds. 

Corollary 2 : If u ^ (0,<r5l) and x (u,<r5/k • I), then a k* always exists 
such that MSEP(Xg) - MSEP(X,) > 0 for all p© * A» where 0 < k < k*. 

Now let 

z: = 

z©:= nVo 

and consider the identity 

B:= (I - ZxxN) s (I + ZxxN)“* (54) 

then, the quadratic bias from (40) and the condition of improvement 
from Theorem 2 each take the following forms respectively 

hoho = z'(I+ZoZo)“*N“’4BB'N-’i(I+ZoZo)"^z (55) 

z' (I+2zoZo)~^z < 1 

It is now easy to prove the following corollary using the same steps 
followed in Corollary 2 of BLE. 

Corollary 3 ; The quadratic bias due to the erroneous prior information 
is bounded under the condition of improvement and 


(52) 

(53) 
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sup bobo < Xm 
z' (I+2zoZo)~* ^ 1 


( 56 ) 


where is the largest eigenvalue of (I + NExx)~’N *. 


5.6 BEST HOMOGENEOUSLY LINEAR UNBIASED PREDICTION WITH PRIOR 
INFORMATION 

Theorem 1: Best Homogeneously Linear Unbiased Prediction, HOMBLUP, 

(Schaffrin, 1983). Let Gy be a linear predictor of x in the linear model 
(1) - (2), then the optimum value of G for which 


E(x - Gy)'(x - Gy) = min 

G 

(56) 

subject to 


E(x - Gy) = (I - GA)fi = 0 

(57) 

in the sense of 


tr[E(x - Gy)(x - Gy)' + 2(1 - GA)mX'] = min 

G 

(58) 

where X' is a Lagrangian multiplier vector, leads 

to the HOMBLUP Xj of x 

X2== Gy 


= (I + ZxxN)-^((ZxxA'ZGiy + [(/i'N/i)-V'{(T 

+ ZxxN)-0'A'Ej‘y]-M) 

where 

(59) 

N:= A'l'iA 

(60) 

N:= N(I + ZxxN)"‘ 

(61) 


The mean square error matrix of the prediction is given by 
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MSEP(xa) = (I + + ExxN)^M' + (I + ZxxN)'^Exx 

(62) 

Note that this predictor, similar to BLUE (section 4,6), reduces to GLSE 
as a result of the weak unbiasedness condition (57) for the univariate 
case, thereby no improvements are possible through prior information. 

Proposition 1 : HOMBLUP X 2 of x is better than the sample predictor Xg 

of X with respect to Definition 2 . 

Proof : Consider the difference of the MSEP matrices of X 2 and Xg given 

by (62) and (17) respectively 

MSEP(x 3 ) - MSEP(xa) = N-* - {(I + J:xxN)“V(m'Nm)-V'[(I + IxxN)~M' 

+ (I + ZxxN)-*Ixx) (63) 

which reduces after some manipulations to 

MSEP(Xg) - MSEP(xa) = (I - I^xN) [N~* - V'l (I " IxxN) ' (64) 

Since (I - ZxxN) = (I + ZxxN)“‘ as a result of Identity A. 9 and since the 
term within the brackets is symmetric, (64) is p.s.d. iff 

[n - m(m'Na»)“‘m ] > 0 (65) 

(Theorem A.6). Now let 

B:= fr‘ (66) 

d: = /l(/i'N^t)“‘^ (67) 

then (65) can be written as B - dd' and 

B - dd' >0 


iff 
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d'B"‘d < 1 


(Theorem A.l). Since, from (66) and (67) 
d'B"‘d = = 1 


this implies 

MSEP(Xs) - MSEP(xa) > 0. 

This completes the proof. 

As shown in Corollary 1, this predictor is better than the sample 
predictor under the MSEP criterion of preference, and it is applicable if 
there exists a valid prior information po and a covariance Exx such that 

Mo * M== E(x) (68) 

Yet this condition is seldom met in practice. Therefore, it is also 
necessary to examine the possibility of improvements when mo M (mo is 
not proportional to m)* Consider the predictor given by (59). If the 
prior information mo to be used in this expression (by replacing m with 
Mo in (59)) is different than the expected value of the random 
parameters the resulting predictor is no longer unbiased. Let us denote 
this predictor under wrong prior information by x ? then 

X. - (I + IxxN)-'((ExxA'Iuiy +[(moNmo)-*Mo((I + ZxxN)-0'A'I-iy] -Mo) 

(69) 

where now mo M and the resulting bias can be obtained from 

bo:= E(x - xa) = M - E(x 2 ) (70) 

Substitution of (69) into (70) gives the following bias vector 

bo = (I ~ XxxN) [l “ Mo(moNmo)~‘moN]m (71) 

Now, it is easy to see that for mo=m bias vanishes. Similarity for mo ® M 
the prediction remains unbiased. Consequently, the incompatibility due 
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I 


to scale differences between sample information and additional 
information on the expected values of the parameters do not have any 
influence on the predicted values. 

To investigate the conditions under which biased prediction as a 
result of wrong prior information is better than the sample prediction 
consider the MSEP of the biased predictor Xj of x with prior information 
Ho ^ H E(x) and the nonvanishing bias vector bo given by (71). Then 

MSEP(xa):= E(x - Xa)(x “ ^ 2 )' (72) 

leads to 

MSEPCxa) = (I “ IxxN)Aio(poN/io)~Vo(I “ IxxN) " + (I - ExxN)Ixx+ bobo 

(73) 

Let us also observe that 

MSEP(Xs) = DCxg) = 3 N“*(I - NE^x) (74) 

Now, under the definition of improvement as given by Definition 2 it can 
be shown, after a simple transformation, that 

MSEP(Xg) - MSEP(xa) = (I - ExxN) [b - BNmm'NB](I - (75) 

where 

B:= [n~‘ - Mo(hoNho)~^Mo1 (76) 

Since B is symmetric and p.s.d. for any ho> it can be expressed as B = 
B^B^ (Theorem A.2). Hence, (75) takes the following form 

MSEP(Xg) - MSEP(xa) ^ C(I - dd')C' (77) 

where 

(78) 

(79) 
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C:= (I - IxxN)B^ 
d:= 


Now, (77) is p.s.d. iff 


d^d = — 


MoNmo 


< 1 


as a result of Theorem A.l. However, 


sup ^ J ,,, , „ 


(80) 


(81) 


(Theorem A.3). Also 


inf 

Mo p'N/io 


(82) 


(Theorem A.4), where Xo is the minimum eigenvalue of 

nVm'N’^ (83) 

Since this matrix is of rank one, Xo = 0. Therefore, as a result (80), 
(75) is p.s.d. iff 

d'd = m'Nm < 1 (84) 


Hence, the following theorem has been proven. 

Theorem 2 : A sufficient condition for the biased predictor X 2 of x to be 

better than the sample predictor Xg of x for all /lo M ®(x) under the 
MSEP of preference is 

m'N(I + I^^N)~V < 1 (85) 

Now, if u ** (0 ,<t 5I) and x ^ (u, <r5/k • I) where k is a proportion 

constant relating the variances of u (< t ^) and x (<Tx)> then the above 

condition can be written as 

+ (A'A)-*]~‘^ < 1 (86) 
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(note that this condition is independent of po)> This expression is an 
increasing function of k~*, and it approaches zero as 

lim {<f"V [k'*I + (A'A)-‘]"‘p) = 0 (87) 

k 0+ 

Therefore the following corollary holds. 

Corollary 2 : If u (0,<r5l) and x (u, tr^/k • I), then there always 

exists a k* such that MSEP(Xg) - MSEPlxa) > 0 for all po A» where 
0 < k < k*. 

Now let 

z:= nV (88) 

2o:= nVo (89) 

then the quadratic bias, from (71), is given by 

bobo = z'(I - Zo(zoZo)~*Zo)N“^(I - IxxN)"(I ~ IxxN)N~^ 

• (I - Zo(zoZo)"*Zo)z (90) 

and the condition of improvement under erroneous prior information (85) 

transforms to 

z'z < 1 (91) 

The following corollary holds. 

Corollary 3 : The quadratic bias due to the erroneous prior information 

is bounded under the condition of improvement and 

sup bobo < (92) 

z'z < 1 

where is the largest eigenvalue of (I + 
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Proof : Considering 


sup z'[I - Zo(zo) ^Zo]z = z'z < 1 (93) 

and following Definition A*4 

sup bobo < Am (94) 

z'z < 1 

where is the largest eigenvalue of 

- NE^^)(i - (95) 

Now, using Theorem A.9, it can be shown that 

tr[N-y^I - = tr[(I - NZ^J(I - NZ^d'rT'] 

= tr[(I + (96) 

Therefore the largest eigenvalue of (95) is equal to the largest 
eigenv^ilue of (I + NExx)”^^'’^* This completes the proof. 

5,7 BEST INHOMOGENEOUSLY LINEAR UNBIASED PREDICTION WITH PRIOR 
INFORMATION 

T heorem 1 : Best Inhomogeneously Linear Unbiased Prediction, 

INHOMBLUP, (Schaffrin, 1983). Let Gy + d be an inhomogeneous linear 
predictor of x in the linear model given by (1) - (2). Then the optimum 
value of G and d for which 

E(x - Gy - d) ' f X - Gy -- d) min "97^ 

G , d 

subject to 

b:= E(x - Gy ~ d) (I “ GA)m “ d = 0 (98) 

leads to the INHOMBLUP X 3 of x, which is given by 
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X3 = (I + IxxN) ‘dxxA'Iuiy + m) 


( 99 ) 


where, N:= A'luuA. The mean square error matrix of the prediction is 
MSEP(x 3) = D(x - X 3 ) = (N + ni)~‘ (100) 

Corollary 1 : INHOMBLUP X 3 of x is a better predictor than the sample 

predictor Xg of x according to Definition 2. 

Proof ; Consider the MSEP(x 3 ), from (100), and MSEP(xg), from (17), then 
MSEP(Xg) - MSEP(x 3 ) = - (N + > 0 (101) 

following Theorem A.7. 

Again, as in the previous cases, for this predictor to be applicable 
additional information about E(x) and its covariance matrix E^x needs 
to be available. However if this information p© is not compatible, that is 
Po * Mt then the prediction is no longer unbiased ( this biased predictor 
is now denoted by and the bias of the prediction is given by 

bo:= E(x - X 3 ) = (I + ZxxN)~‘(p - Mo) (102) 

and the biased predictor is 

X 3 = (I + IxxN)-VZxxAduiy + Mo) (103) 

Considering (102) and (103) the following theorem holds. 

Theorem 2 : The necessary and sufficient condition for the biased 

predictor Ss of x to be better than the sample predictor Xg of x for all 
Mo * M according to Definition 2 is 

(p - /io)'N(p - ;.o) < 1 (104) 

Proof : The mean square matrix of the biased predictor S 3 of x is, from 

MSEP(x3) = D(x3 - x) + bobo (105) 
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where b© is given by (102) and 


D(x - X3) = (N + ni)-^ (106) 

Now the difference matrix implied by Definition 2 is 

MSEPCXg) - MSEPCxa) = - [(N + + bob©] (107) 

which reduces after some manipulations to 

MSEP(Xs) - MSEP(x 3 ) = B(ir» - ss')B' (108) 

where 

B:= (I + ZxxN)~‘ (109) 

s:= fi - fio ( 110 ) 

Therefore, (108) is p.s.d. iff s'Ns < 1 (Theorem A.l). This completes the 
proof. 

Now assume that 

u ~ (0,<r3D (111a) 

X ~ (;*, <r5/k • I) (111b) 

where k is a proportion constant. Then (104) takes the following form 

s'Ns = s'[k“*I + (A'A)“*] *s < 1 (112) 

This expression is again an increasing function of k and it approaches 
zero in the limit. Thus the following corollary holds. 

Corollary 2 ; If u ^ (0,<r2l) and x ~ (p, <r5/k • I) then there always 
exists a k* such that MSEP(Xg) - MSEP(x 3 ) > 0 for all ^© * /i where 
0 < k < k*. 

Let us now examine the amount of bias introduced by the use of 
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incompatible prior information, under the improvement condition given 
by (104). 


Corollary 3 : The total bias due to the erroneous prior information 

under the condition of improvement (104) is bounded. The supremum of 
it is equal to or less than the largest eigenvalue of (I + 
i.e., 

sup bobo < Xn, (113) 

Xo 

s.t.: s'Ns < 1 

The proof follows the similar lines given in Corollary 4.6.2. 


5,8 FURTHER DISCUSSIONS 

In the previous sections three different predictors were examined. Let 
us emphasize that prediction in technical or in everyday usage has 
different connotations. It can be defined generally as a statement 
about an unknown and uncertain event. Prediction in this study implies 
the estimation of stochastic parameters whose meaning is also 
encapsulated by the preceding general definition (for a comprehensive 
discussion, confer (Bibby and Tautenburg, 1977)). In the linear model 
described by (1) one might be interested in the estimation of the 
expected values of random parameters (classical approach) or the 
estimation of random parameters themselves (Rao, 1965; Schaffrin, 1983). 
Within the scope of crustal movement analysis the latter implies that 
short-term deformations which are regarded as stochastic with certain 
expectations over longer periods of time are of interest. Within this 
context, the above examined prediction techniques estimate short-term 
realization of random deformation parameters provided that prior 
information about their long-term behavior (their expected values and 
dispersion matrix) are known beforehand. 

It was demonstrated in section 5.4 that the GLSE technique, in the 
absence of any prior information, is also a predictor which can be used 
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TABLE 2: PREDICTORS WITH PRIOR INFORMATION 
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to estimate the random model parameters. Note that the expected values 
of random parameters can also be estimated using the estimator given in 
Lemma 1 which is similar to the GLSE but has a different dispersion 
matrix. 

Since the purpose of using additional information is to improve the 
estimates, the above predictors were compared against the sample 
predictor GLSE. If the prior information is correct then its introduction 
into the estimation with these techniques improves the results as 
compared to GLSE according to the MSEP criterion (except for HOMBLUP 
in the univariate case). 

Comparisons were also made when prior information is not compatible 
which is obviously the case in practical applications. Improvement 
conditions over sample predictor GLSE when the prior information is not 
compatible were derived. Results are summarized in Table 2. 

In this section these results are interpreted within the scope of the 
proposed algorithm as we did in Chapter 3. First some additional 
information is given about these predictors. They are then compared 
with respect to each other under correct prior information using the 
MSEP matrix criterion. Similar comparisons are not definitive with 
respect to their corresponding MSEP matrices when the prior information 
is not compatible. In this case their corresponding improvement 
conditions can be used to decide when each estimator is suitable for the 
improved estimation of deformation parameters. 

Now let us consider HOMBLIP. Using (27) and matrix identities 
given by Theorem A.9, this predictor can be written as 

Xx = [I - + /ioMo')“*]Xs = o('Xs (114) 

where 

N := N(I + XxxN)“‘ (115) 

rt := I - + MoMo')"‘ (116) 

and Xg is the sample predictor. For (H4) reduces to BLE. 

Therefore, this estimator is similar to BLE. Indeed, for the univariate 
case the expected value of Xi is 
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E(xi) = oc/i 


(117) 


where 

o( = [1 - (1 + (n/<rS) <r^ + (n/<r2) (118) 

Since 0<o(<l for all ^ := E(x), this predictor provides a shrinkage of the 
sample predictor Xg of GLSE type, and it underestimates the estimates of 
the expected values of the random parameters. Hence it is a biased 
predictor in the sense of E(xi) p. 

The usefulness of this predictor depends on the availability of prior 
information po about p and the covariance matrix Xxx ^1^® random 
parameter vector. It was shown that if this information is available and 
it is strictly correct then HOMBLIP improves the estimates with respect 
to the sample estimator (Corollary 5.5.1). Since po - p cannot be 
realized in practice, improvements over the sample estimator is possible 
if 

P iir^ + 2poPo')-^ P < 1 (119) 

as a result of Theorem 5.2.2, or if 

p'^P < 1 (120) 

which is an upper bound for (119). Under the assumptions given in 
Corollary 5.5.2, namely u "(0, (rjl) and x ""{p, (crj/k)l), where k := 
this expression reduces to 

I < 1 ( 121 ) 

k-‘ + A, 

where tj are the elements of i-C' p, and are the eigenvalues of 
(A' A)”* = CAC'. The following deductions can be made in order for 
(121) to hold 

i) Signal-to-noise ratios tf/<rg should be sufficiently small. Note that 
t, is a function of u := E(x) . 
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ii) Components of signal vector should not be very pronounced when 
Xf is small. 

iii) k should be sufficiently small (i.e., <rj is large). 

Consider now HOMBLUP given by {59) which can also be expressed 
after a simple manipulation as 

Xa = fl - N-*j^ (l - (122) 

m'N/x 

Since the elements of the matrix within parentheses are bounded, 
HOMBLUP displays robust properties with respect to an incompatible 
prior information (i.e., when p© which is used in (122) to replace 
unknown is not proportional to p). This predictor is also unbiased in 
the sense of E{xa) = ;u. However, it reduces to the sample predictor x^ 
for the univariate application, thereby offering no improvements in this 
case. 

If HOMBLIP is compared to HOMBLUP when ~ M under Definition 2, 

MSEP(Xa) - MSEP(xx) = Nmo)"' (1 + N^o)"* MoMo > 0 (123) 

holds. HOMBLUP is therefore not as efficient as HOMBLIP over Xg under 
strictly correct prior information. On the other hand, if prior 
information is not compatible, then, as in the case of estimation with 
deterministic parameters, the superiority of one predictor over the other 
cannot be established directly using Definition 2 since the resulting 
difference matrix is indefinite again. In this case the improvement 
conditions given by 

m'Nm < 1 (124) 

/i'(N~* + 2/ioMo)“V ^ 1» Mo 5^ M (125) 

for HOMBLUP and HOMBLIP respectively are more informative. Since (123) 
is an upper bound for (124), the improvement condition for HOMBLIP is 
comparatively easier to achieve when fio»0. Since (124) was used in 
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making inferences about (125) as an upper bound, the same results 
given by (121) hold also for HOMBLUP. 

The third predictor, INHOMBLUP, which was examined in section 5.7 
possesses different properties with respect to HOMBLIP and HOMBLUP. 
Using (99) and matrix identities given by Theorem A. 9, this predictor 
can be written as 

X 3 = Xs - N“^N(Xs - Mo) (126) 

Compared to (114) and (122) this predictor is sensitive to prior 
information due to the unbounded effect of mo in (126) (actually this 
statement is not strictly correct since the effect of mo can be controlled 
artificially through Exx shall examine later under certain 

simplifying assumptions). This is not desirable if the prior information 
is likely to be wrong. However large discrepancies between mo and p 
can easily be detected through the testing procedure given in section 
5.2 according to the proposed approach, and prior information may 
either be corrected or not used at all. 

In the case of small discrepancies this predictor can still be used to 
improve the estimation of stochastic parameters provided that 
improvement condition (104) derived in section 5.7 holds. 

Before we elaborate on this condition it should be noted that this 
predictor, in addition to being unbiased, is better than the other two 
predictors with respect to Definition 2 when prior information is 
compatible. That is, from (31) and (100) 


MSEP(Xi) - MSEPCxa) = (1 + m'Np)-‘(I + IxxN)“Vm'[(I + IxxN)”*]' > 0 

(127) 

Hence it is better than HOMBLIP and also HOMBLUP as a result of (123). 

Now consider the improvement condition (85) for INHOMBLUP when mo 
i M 

s' Ns < 1 (128a) 

Under the simplifying assumptions u'~(0, crgi) and x~(/i, (<r2/k)I), this 
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condition reduces, as given in (112), to 


[k-'I rA'A)"’ ]'"s < 1 ( I2Rb ' 

where k: " — and crj are the a priori variances of the disturbances 

and the stochastic parameters respectively. Using Theorem A. 5, 

= CAO", (128b) can be written as 


^ t . 2 /rt- 2 

j ^ ^ *< 1 

k”' + \i 


f 1 29 ' 


where t^ are the elements of t = C"s (note that the mxl t vector is a 
function of mxl vector s of systematic deviations rather than the 
function of fj := E(x) as in the case of HOMBLIP and HOMBLUP). are 
the eigenvalues of (A'A)“^. Now (129) is likely to hold when 


i) The bias-to-noise ratio (s's/<t5) is sufficiently small 

ii) are not small 

iii) is not too erroneous in directions where there is liMlc infoi 

mat ion (i.e., when X^ is small s, is also small) 

iv) k“^ is large (i.e., cr^ is large) 


Now let us compare these three predictors with respect to each 
other using their corresponding improvement conditions over GL3E when 
;io t If the following inequality is considered 


J_ £1 > I TOO'; 

Amin + Xi 

the improvement conditions' for HOMBLIP and HOMBLUP given by (120) 
and for INHOMBLUP given by (129) can further be simplified to 

pV < (k"‘ + Xmin) (131a) 

s's < (k-> + X^^n) (131b) 
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where is the minimum eigenvalue of (A'A)~‘. Since the left-hand 

sides of the above equations are the same, the preference of one 
estimator over the other is determined by the magnitude of the expected 
value of the random parameter vector p and the magnitude of the 
systematic errors s. 

Before we elaborate on this topic let us examine the other variables 
which may contribute to improvements over the sample predictor. Since 
in practice crustal movement observations such as repeated baseline 
measurements are performed using the most precise existing 
measurement techniques, (the a priori variances of the observed 

baseline differences) can be considered to be fixed and obviously cannot 
be used as a control variable in achieving the improvement conditions. 
Similarly, (minimum eigenvalue of (A' A)”*) would not play an 

important role in achieving the improvements since it is expected to be 
small as a result of the optimal design of deformation networks. For 
instance, the D-optimal design given in Chapter 3 renders the 
eigenvalues of (A'A)~‘ small, whereas the third variable k := can 

be used effectively to achieve the improvement conditions. 

More uncertainty can be attributed to the prior information through 
<fx» However, the larger <rj is the less the effect of prior information on 
the estimates will be. This in turn implies that the gain by using prior 
information would decrease. Although the gradual reduction of the 
effect of additional information on the estimates by this way is known in 
geodetic practice in general, the above conditions can be used 
effectively in reducing the effect of prior information efficiently. 

Since the right-hand sides of the simplified improvement conditions 
(131a) and (131b) are the same, the usefulness of the predictors 
depends on the magnitudes of a and p. Although these quantities are 
unknown in practice, the above improvement condition has already 
provided enough information about the central aspects of these 
predictors without resorting to extensive simulation studies in order to 
decide whether to incorporate prior information into the estimation of 
random deformation parameters in the improved estimation stage of the 
proposed algorithmic approach. 
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In the meantime, sample estimates pg of p computed, for instance, 
using the estimator given by Lemma 1 can be used to get some 
information about p and s to evaluate the improvement conditions, 
provided that the null hypothesis testing is not rejected. In this case, 
if AsAs ^ s"s, where (s := As ~ Mo)> then HOMBLIP and HOMBLUP are 
preferable; otherwise INHOMBLUP is advantageous in introducing prior 
information. If null hypothesis testing is rejected, prior information 
should be introduced cautiously by attributing more uncertainty to 
priors or it should not be used at all. 

In Chapter 4 the MSE matrix criterion for the estimation of 
deterministic parameters, and in this chapter its predictive version 
(MSEP), has been used to derive the improvement conditions with 
respect to the sample estimator GLSE as a result of noncompatible prior 
information. However, many different methods in comparing estimators 
are in use in statistics. 

Among these, the unbiasedness property has not been given much 
attention in this study because even the unbiased estimators turn out to 
be biased when prior information is not compatible, as readily 
demonstrated in Chapters 4 and 5. Moreover, the unbiasedness property 
is not a measure of closeness to the true value especially when the 
number of observations is not very large. 

Alternatively, the minimum variance criterion has not bee used 
since the variances of these estimators can always be made arbitrarily 
small by increasing artificially the amount of bias. Although the MSE 
and MSEP matrix criteria compensate for these deficiencies, there is no 
special reason to think that this is the best way of comparing 
estimators. It seems that they give too much weight to the case when 
the observations are not very precise or the prior information is badly 
wrong. 

The Bayesian interpretation of these estimators is also possible 
(see, for example, (Mood et al., 1974) for a Bayesian derivation of 
INHOMBLUP for normally distributed random parameters). 

Kubik ( 1986, private communication) has indicated equivariance 
properties of estimators in connection with changes in origin and scale 
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of measurements. These properties may have important ramifications in 
the selection of estimators depending on the type of observations and 
the nature of the estimates. They are defined as follows. 

If yi, ...» y„ represent measurements and a parameter being 
estimated are in the same units with these measurements, it is 
reasonable to require that an estimator x satisfies the following 
property, 

x(yi + c, .... y„ + c) = x(yi, ..., y„) + c (132) 

for every value of yj, y 2 > ...» yn and constant c. In other words, the 

estimate should increase by an amount c if each measurement increases 
by c. This property is also known as location invariance. 

Similarly an estimator x is said to be scale equivariant (scale 
invariant) if 

x(cyx, ..., cy„) = cx(y,, ..., y„) (133) 

for every yi , ..., y„ and constant c. The idea is that the estimator 

should be independent of measurement units employed. 

Now let us consider the following linear model 

y,=x+u, i=l, ...,n 

(134) 

Uj ~ (0, a^,^) 

for a location estimator of x where yi denotes observations, Uj denotes 
identically and independently distributed disturbances with a priori 
variance and x is the parameter to be estimated (stochastic or 

deterministic). Then the sample mean x, for instance, is a scale 
invariant estimator since 


cx 


? cy< 
i = x n 


c 


I Yi 

< = i 

n 


cx 


(135) 
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Similarly if a priori information about x is available, BLE of x under the 
above linear model is given by 


Xi 


(1 + 



xo^n 


)“‘ X 


( 136 ) 


This estimator is also scale equivariant (in this case changes in scale 
affect Xo and also) because 


CXj = (1 + 


c“a, 


c^Xo^n 


) ^ cx = cx 


(137) 


The scale equivariance of the other estimators and predictors can be 
demonstrated in a similar way. 

The sample mean x is also a location equivariant estimator since, 
using (132), 


X + c 


•ij (v{ + c) 


n 


n 

^ y i 
n 


+ C = X + c 


( 138 ) 


BLUUE, BLUE, HOMBLUP and INHOMBLUP are also location equivariant 
under the linear model (134). However, BLE and HOMBLIP are not 
location equivariant. The following proposition, which can be easily 
proven using the principle of reductio ad absurdum, generalizes the 
demonstration to different estimators: "If x is a location equivariant 
estimator of x then no multiple of x satisfies location equivariance 
property." Therefore BLE and HOMBLIP cannot be location equivariant 
because they can be expressed as shrinkage of the sample mean which 
is location equivariant. 

In terms of the improvement conditions for BLE and HOMBIJP this 
implies that these conditions can always be satisfied or dissatisfied by 
the proper choice of origin of measurements. However in crustal 
movement applications where measurements are generally differences of 
baseline observations or angles, any changes in the origin of these 
observations do not make much sense. Hence we need not insist on the 
location equivariance in these cases. In those applications where the 
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origin change is meaningful, these two estimators should be carefully 
handled because the estimates can arbitrarily be changed when the 
origin of the measurements is changed. 
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Chapter 6 

CONCLUSIONS AND RECOMMENDATIONS 


In recent years, the analysis of crustal deformation measurements has 
become important as a result of current improvements in the geodetic 
methods and an increasing amount of theoretical and observational data 
provided by several earth sciences. However, a combined analysis of 
different types of information has not received proper attention. 

In this study, a "first generation" data analysis algorithm which 
combines extraneous information with current geodetic measurements was 
proposed. Relevant methods which can be used in the algorithm have 
been discussed. Prior information is the unifying feature of this 
algorithm. Some of the problems which may arise through the use of 
additional information in the analysis have been indicated and 
preventive measures were demonstrated. 

The first step of the algorithm is the optimal design of the geodetic 
networks. Although the current research on optimal designs of geodetic 
networks is extensive, deformation model oriented network designs have 
yet to be exploited. The duality between geodetic axiomatic model 
oriented designs and deformation model oriented designs deserves 
further investigations. As an example to deformation model oriented 
designs, it was shown that the regular polygonal deformation networks 
composed of equilateral triangles are uniformly D-optimal for 
homogeneous deformation field. The methodology used in this derivation 
can easily be extended to the optimal design of possible alternative 
deformation models which may arise in practice. 

The concept of optimal network designs, which does well for the 
totality of different postulated models, was identified. Such designs can 
in general be constructed iteratively using prior preferences as weights 
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on each model. Formulation of the problem for deformation networks is 
open to further investigations. 

The second step in the algorithm identifies the descriptive model of 
the deformation field. Previous information about the nature of crustal 
movements as a result of previous measurements and/or theoretical 
considerations may suggest more than one model to represent the 
deformations. In this case, sequential experimental designs are 
necessary to identify the suitable model. A method based on the 
entropy measure of information, proposed by Box and Hill (1967), was 
applied to a group of postulated deformation models and the 
identification of the correct model was demonstrated through an example. 
The method uses prior information in the form of prior preferences for 
each model in a Bayesian setting. This additional information, in turn, 
allows prediction of optimal observations and computation of the 
likelihood of each model. Although the numerical example indicated that 
the method effectively identifies the correct model, its usefulness needs 
to be demonstrated in practice. 

The next step in the algorithm is the improved estimation of 
deformation parameters. Although these parameters are estimated in the 
process of model discrimination, they can further be improved by the 
use of extraneous information about them. Compared to the previous 
topics and the regular estimation techniques, the use of additional 
information in the estimation of deformation parameters has not been 
exploited in detail in crustal movement analysis. Therefore, a major part 
of this study has been devoted to this subject. 

According to the proposed algorithm, prior information must first be 
checked through null-hypothesis testing against the estimates calculated 
using the least squares method, which employs only sample observations, 
before it is introduced to the final estimation. This procedure is likely 
to detect large discrepancies between two different estimates, but may 
not be conclusive if both estimates are generally in the same direction 
and close in magnitude. However, it was demonstrated that even in 
these cases, incompatible prior information can still be used to improve 
the final estimates under certain circumstances. 
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Introduction of prior information can be achieved using different 
estimation techniques. Schaffrin (1983) proposed a group of estimators 
that can be used for such purposes. Of these techniques, Best Linear 
Uniformly Unbiased Estimator BLUUE, Best Linear Unbiased Estimator 
BLUE, and Best Linear Estimator BLE, were examined due to their rather 
unknown statistical properties in the geodetic area. 

Since the purpose of using prior information is to improve the 
estimates in the proposed algorithm, these estimators were analytically 
compared against GLSE. It was shown that they are in general better 
than GLSE with respect to their corresponding mean square error 
matrices. However, these inferences can only be made as long as prior 
information is known adequately. Otherwise, possible discrepancies 
between prior information and the true value of parameters render 
BLUUE and BLUE biased and introduces additional biases in the biased 
estimator BLE. Under these circumstances the advantage of using 
additional information was investigated. Comparisons of these estimators 
under spurious prior information against GLSE led to the conditions for 
improvements (Table 1). These results also provide the means to select 
a suitable estimator under wrong prior information. In general, 
improvements are possible for biased BLUUE if prior information is not 
too erroneous or observations relatively poor. Meanwhile, it was shown 
that BLUE and BLE are robust against wrong prior information. If the 
true values of the parameters are relatively small with respect to the 
observation noise, then improvements over GLSE are possible by using 
these estimators. 

As discussed in Chapter 5, deformation parameters can be regarded 
as random quantities with certain means and variances. This 
interpretation led to the study of estimation of stochastic parameters 
(prediction). In this class. Best Inhomogeneously Linear Unbiased 
Prediction INHOMBLUP, Best Homogeneously Linear Unbiased Prediction 
HOMBLUP and Best Homogeneously Linear Prediction HOMBLIP were 
considered. A null hypothesis testing procedure was developed to check 
the compatibility of prior information about the means of the stochastic 
deformation parameters with the estimates implied by the sample 
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observations. It was demonstrated that, as a reference criterion, GLSE 
is also a Best Homogeneously Linear Unbiased Prediction if no prior 
information is available. The above predictors were then compared 
against GLSE using the mean square matrix of prediction criterion 
assuming that correct prior information on the mean of the true 
stochastic parameter vector and its covariance matrix are available. In 
each case, it was shown that prior information improves the estimation 
of stochastic parameters. 

Following the similar discussions in the case of estimation of 
deterministic parameters with erroneous prior information, once again 
the conditions for improvements were derived (Table 2). It was found 
that these predictors not only possess the properties of the estimators 
with deterministic parameters examined, but they also enjoy the 
additional flexibility provided by the stochastic interpretation of 
deformation parameters through their covariance matrices. 

Finally, analytic conditions derived for these estimators and 
predictors are not only useful for understanding their properties but 
are also important in deciding which estimator or predictor is to be 
used in the improved estimation stage of the proposed algorithm. These 
conditions also replace extensive simulation studies for different 
applications. Particular cases can easily be inferred from the general 
conditions (Tables 1 and 2). 
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Appendix 

SOME USEFUL THEOREMS AND IDENTITTES 

Theorem A.l : Let I be the nxn identity matrix and g an nxl vector. 

Then 

I - gg' > 0 iff g'g < 1 
For proof see (Yancey et aL, 1974). 

Th eorem A. 2 : Let A be an nxn symmetric matrix; if A > 0, then A can be 

written as 

A ^ 


and A^* is also symmetric. For proof see (Mardia et a]., 1979), 
Theorem A.3 : Let A be an nxn matrix and U an nxl vector. Then 


sup ( ir X ) ^ 
X x' Ax 


U' A'UI 


and the supremum is attained at x = A^^U. For proof see (Rao, 1979), 

Theorem A. 4 : Let A bo an nxn matrix and > Xo ••• ^ 

eigenvalues then 


sup x" Ax 
X x' X 


Ai 


inf X ' Ax 
X x"x 


For proof see (Rao, 1973), 


Theorem A.5 : Let A be an nxn symmetric matrix. Then there exists an 

orthogonal matrix P such that 
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P'AP = A 


with P'P = PP' = I, where A = diag(Xi, A„) are the eigenvalues of A. 
For proof see (Mardia et al., 1979). 

Theorem A.6 : Let A be an nxn p.s.d. matrix; then PAP' is p.s.d. for any 

nxn matrix P. For proof see (Graybill, 1976). 

Theorem A.7 : If A > 0 and B > 0, then A“* - (A + B)~‘ is p.s.d. For 

proof see (Goldberger, 1964). 

Theorem A.8 : Let A be an nxn symmetric matrix; then 

tr{A) = 

where X) are the eigenvalues of A. For proof apply theorem A. 5. 

Theorem A.9 : If A is an nxn matrix, p.d. and (A + BDC) > 0 and BDC are 

conforming matrices then 


(A + BDC)-‘ = A~^ - A“*BDCA-‘(I + BDCA“‘)"‘ 

= A~‘ - (I + A-»BDC)-‘A"‘BDCA-i 

holds. For the derivation see (Henderson and Searle, 1981). U.seful 
extensions of this theorem in the general notation of the text are 

A'duu + AI>,xA')-*A = N - N(I + I,<xN)"‘IxxN = 

= N(I + ZxxN)"* = (I + NExx)"‘N =: N 


and for a conforming vector x 

N xx'N 


(N~^ + xx' ) * = N - 


1 + x'Nx 


where 


N := A'ZuuA 


Theorem A. 10 : If x ^ Np(/;(,I), Z > 0, then 
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x'r‘x - xp(m'tv) 

(x - m) (x - p) Xp 
F or proof see (Tautenburg, 1982). 

Theorem A. 11 : Let pxl random vector x N(/i, E) where E is 

Let A be a qxp matrix of constants, and let b be any qxl 
constants. Then 

y = Ax + b ~ Np(A>i + b, AEA^ ) 

For proof, see (Graybill, 1976). 


full rank, 
vector of 
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